LIBRARY 

DELHI SCHOOL OF ECONOMICS 




duje; dat£ slip 
TEXT BOOK 


Cl. No. 

Ac. No. 

This book shotilH be returned on or before the date last 
stamped below* An overdue charge of 25 Paise per day 
will be charged for the first two days and 50 Paise from 
the third day the book is kept overtime. 



CONTRIBUTIONS TO 
MA'FHEMATICAL STATISTICS 



WILEY PUBLICATIONS IN STATISTICS 

Walter A. Shewhart, Editor 

Mathematical Statistic! 

FISHER—Contributions to Mathematical Statistics. 

WALD—Statistical Decision Functions. 

FELLER—An Introduction to Probability Theory 
and Its Applications, Volume One. 

HOEL—Introduction to Mathematical Statistics. 

WALD—Sequential Analysis. 

Afflied Statistics 

HALD—Statistics (in fress). 

TIPPETT—Technological Applications of Statistics (in fress), 
DEMING—Some Theory of Sampling. 

COCHRAN and COX—Experimental Designs. 

DODGE and ROMIG—Sampling Inspection Tables. 

RICE—Control Charts. 

Related Books of Interest to Statisticians 

HAl*SER and LEONARD—Government Statistics 
for Business Use. 





^ ^A. fisher 




7^ fuher 

DEPARTMENT OF GENETICS 
UNIVERSITY OF CAMBRIDGE 


CONTRIBUTIONS TO 
MATHEMATICAL STATISTICS 


INDEX PREPARED BY 

John Tukey 

PRINCETON UNIVERSITY 


NEW YORK 1950 
John Wiley & Sons, Inc. 
Chapman & Hall, Ltd. London 



Copyright, 1950 

BY 

John Wiley & Sons, Inc. 

All Rights Reserved 

This book or any part thereof must not 
be reproduced in any form without 
the written permission of the publisher. 


Copyright, Canad.\, 1950 , I-nternationai, Copyrkjht, 195 ( 
John Wiley Sons, Inc., Proprietors 

All Foreign Rights Reserved 
Reproduction in whole or in part forbidden. 


PRINTED in the UNITED STATES OF AMERICA 



EDITOR’S PREFACE 


Like so many others, for more than a quarter of a century I have 
found inspiration an(J guidance in the original contributions of Pro¬ 
fessor R. A. Fisher. Some of my reprints of his early papers have been 
almost worn out through constant use, and for several years 1 have 
wished that 1 had a bound volume of his papers, particularly if he 
could be persuaded to preface each article with a brief note setting it 
in the proper perspective as of today. 

After some persua.sion, Profe.ssor Fisher agreed in the fall of I94(i 
to undertake the selection of what he now considers to be his most 
outstanding c*ontributif>ns and to provide brief introductory notes 
wherever he felt that such information would be of help to students 
of his work. The pre.sent book is the result. Typographical and other 
errors in the original papers have also been corrected by the author 
anti some obscure passages have been clarified. 

A word of explanation ft)r the method of reproduction is perhaps 
in order. 1 had come to have a feeling not only for the ctjntent but 
also for the typographical form in which many of the articlt^s first 
appeared in the journals of the many different learned societies. One 
of my principal rea.sons, therefore, for suggesting the photographic 
reprotluction of these papers just as they appeared, even as to page 
size, was to preserve for posterity the flavor of the original papers 
as well as their content, both of which I had come to cherish. For- 
.tunately this method of reproduction also reduced the cost of pro¬ 
duction so that this book could be put out at a price within the reach 
of the many students who will want to own a copy. 

Then came the question of providing a thoroughgoing subject mat¬ 
ter index that would make it possible for a student to look up a par¬ 
ticular topic in Fisher’s work. I am indebted to Professor .lohn 
Tukey for the preparation of such an index. 

No comprehensive survey of Fisher’s work would be complete with¬ 
out a brief biographical note outlining the highlights in his develop¬ 
ment as a statistician whose work has carried him into so many 
fields. Just such a biographical note had been prepared by Professor 
P. C. Mahalanobis with the cooperation of Professor Fisher on an¬ 
other occasion and published in Sankhyd. With the gracious consent 
of Professor Mahalanobis, this note has been reprotlucctl. 

As editor, I wish to express to the editoi*s of the many different 
scientific journals represented herein my appreciation of their cour¬ 
tesy in permitting us to reproduce the articles contained in this 
volume. 

Walter A. Shewhart 

Aprils 1960 




PREFACE 


In mathematics, more than in other disciplines, original sources often 
tend to be neglected, and sometimes are scarcely accessible to stu¬ 
dents. In more leisured times the best method of making original 
papers accessible has been the voluminous “collected works” often 
published by learned societies in memory of their distinguished mem¬ 
bers. There is, I believe, a great deal to be said for the suggestion, 
which John Wiley & Sons, Inc., made to me, that editorial notes made 
by the author in his lifetime might do much to aid the student, espe¬ 
cially in understanding the original purpose in relation to the common 
knowledge, or common mi.sapprehensions, current at the time the 
paper was written. 

I have tried, in annotating each paper, to recall these circumstances 
accurately, and to give cross references to related papers. The diver¬ 
sity of subject matter will be forgiven, I believe, by readers who 
realise that deep-seated misapprehensions can only be removed by 
clarification over a wide front, and from many points of view. Cor¬ 
rect computational technique; accurate mathematical analysis; the 
development of criteria and abstract concepts related to aims and 
choice, in terms of which the analysis may be unified; and the begin¬ 
nings of an operational inductive logic, have each required exposition 
and exemplification, sometimes not unattended by controversy. In 
each of these fields there is still much to be done. I am still too often 
confronted by problems, even in my own research, to which I cannot 
confidently offer a solution, ever to be tempted to imply that finality 
has been reached (or to take very seriously this claim when made by 
others!). The formal mathematical treatises on the subject, of which 
several gocKl examples are available, cannot easily cope with the variety 
of aspects and interests, each of which may afford opportunities for 
far-reaching advances. Therein lies the particular importance of 
studying the points of view and the methodological approaches by 
which advances have been made in the past. In some cases the title 
is sufficiently explanatory, and no further note will be needed. 

I have, of course, chosen for reprinting the papers most likely to 
be of service. A fuller bibliography has been published from time 
to time in the successive editions of the general introductory text¬ 
book, Statistical Methods for Research Workers. 

R. A. Fisher 

Cambridge, England 

April, 1950 




BIOCJRAPHY 


The following biography of Pr()f«!ssor Fisher is reprinted from Sankhyd 
(Vol. 4, rt. 2, pp. 205-272, Deeember, 1938). 



PROFESSOR RONALD AYLMER FISHER 

Early Days. 

Ronald Aylmer Fisher was born on the 17th of February, 1890, in East Finchley, one 
of the northern suburbs of London, where his father had a large establishment. He was 
the youngest of seven children which would have been eight if his twin brother had 
lived. His father, G. Fisher, of the well-known firm of auctioneers, Robinson and 
Fisher in King Street, St. James’s, was a man of extraordinary energy and had a wide 
and detailed knowledge of the fine arts. The father’s family were mostly businessmen ; 
but an uncle, the younger brother of his father, was placed high as a Cambridge Wrangler 
and entered the Church. The mother’s father was a successful London solicitor noted 
for his social qualities. Besides sturdy- common sense and interest in practical affairs, 
taste for mathematics and the fine arts, Fisher inherited a spirit of adventure from his 
family. His mother’s only brother threw up excellent prospects in London to collect 
wild animals in Africa, and one of his brothers returned from the Argentine to serve in 
the European War and was killed in 1915. Extremely defective vision saved Fisher 
himself from a similar fate, for, though in 1914 he had qualified himself for commission, 
he was refused throughout the war the opportunity of military service. 

As with many mathematicians, Fisher’s special ability showed at an early age. Before 
he was six, his mother was reading him a popular book on Astronomy, an interest which 
he followed eagerly during boyhood. Love of mathematics dominated his educational 
career ; and he was fortunate in coming under the tuition of a brilliant mathematical 
teacher, W. N. Roe of Stanmore Park, also well-known in England as a Somersetshire 
cricketeer. At Harrow he worked under C. H. P. Mayo and W. N. Roseveare. The 
peculiar circumstances of the teaching by the latter will be of interest to statisticians 
familiar with Fisher’s geometrical methods. On account of his eyes he was forbidden 
to work by artificial light, and when he would go to work with Mr. Roseveare in the 
evenings the instruction was given purely by ear without the use of paper and pencil or 
any other visual aid. To this early training may be attributed Fisher’s tendency to use 
hypergeoraetrical representation, and his great power in penetrating into problems requiring 
geometrical intuition. It is, perhaps, a consequence of this that throughout life his 
solutions have been singularly independent of symbolism. He does not usually attempt 
to write down the analysis until the problem is solved in his mind, and sometimes, he 
confesses, after the key to the solution has been forgotten. 

Cambrxdgb. 

Though born to sufficiently wealthy parents, owing to financial difficulties of the 
family at the time, it would have been difficult for him to proceed to the University without 
the aid of scholarships which he succeeded in winning. He joined the Gonville and Caius 
College, Cambridge, in 1909 ; and passed the Mathematical Tripos Part II in 1912 as :x 
Wrangler with distinction in the optical papers in schedule B. He spent another year at 
Cambridge with a Studentship in Physics, and studied statistical mechanics and quantum 
theory under James Jeans, and the theory of errors under F. f. M. Stratton. 

At this period he was consciously out of sympathy with certain tendencies prevalent 
at the time in the University. Lender the influence of Bateson it was widely believed 
that recent work on genetics had discredited the evolutionary principles developed by 
Darwin. Fisher became keenly interested in Mendelian evidence, and was convinced that 
this furnished a basis for a quantitative and mathematical treatment of thj^ theory of 
evolution. He began at this early date a series of studies of which the outside world 
knew nothing until the publication in 1930 of his book on **The Genetical Theory of 
Natural Selection.” 

The second influence at Cambridge with which Fisher found himself out of sympathy 
was the recent passage of control in mathematical teaching from the earlier tradition 


265 



of mathematical physicists to a school of pure mathematicians of largely continental 
derivation. The explicit statement of a rigorous argument interested him, but only on 
the important condition that such explicit demonstration of rigour was needed. Mechanical 
drill in the technique of rigorous statement was abhorrent to him, partly for its pedantry, 
and partly as an inhibition to the active use of the mind. He felt it was more important to 
think actively, even at the expense of occasional errors from which an alert intelligence 
would soon recover, than to proceed with perfect safety at a snail’s pace along well-known 
paths with the aid of the most perfectly designed mechanical crutches. The real inter¬ 
pretation of his state of mind at this time is probably that he was fitting himself for 
mathematical research rather than for mathematical teaching. Fisher himself thinks that 
he was merely a very wilful and impatient youngman. This is no doubt true, but he was 
impatient not because he was young but because he was a creative genius. 

Early Work in Statistics. 

Although he had a successful academic career at Cambridge he had not yet found his 
vocation in life. I have noted before that for some time he studied mathematical physics. 
He also attended, it is said, one lecture on statistics under G. Udny Yule. He read at 
this time very carefully and very critically Karl Pearson’s “Mathematical Contributions 
to the Theory of Evolution.’’ Shortly before the War, in 1914, he first came into contact 
with Karl Pearson. H. E. Soper had discussed in a paper in the Biometrika the 
distribution of the coefficient of correlation in samples from an infinite, normally 
distributed, bi-variate population, but had succeeded in giving only approximate results 
by a tedious process. The problem, which was clearly stated by Soper, was one after 
Fisher’s own heart, and in a week he had sent to Karl Pearson the exact solution but in 
a more hastily and roughly sketched form than even that in which it was subsequently 
published in the Biometrika.^ This first paper, w'hich was the starting point of the modern 
theory of exact sampling distributions, shows in a characteristic manner the working 
of Fisher’s genius. 

It was only after he had completed this important work that Fisher read the earlier 
papers of “Student’’,* and noticed how the representation in many-dimensional space 
so perfectly supplied the completion of “Student’s’’ tentative ideas. A little later he 
solved the problem of the exact distribution of the intra-class correlation. This solu¬ 
tion together with his ^-transformation for coefficients of correlation of both kinds was 
published in 1921 in Metron.* 

Fisher was now seriously interested in theoretical statistics, and any invitation to 
engage in academic work of a statistical character would have been no doubt welcome, 
but no such opportunity occurred for some considerable time. During the war he was 
at first engaged in statistical work in the office of the Mercantile and General Investment 
Company from 1913 to 1915, and from 1915 to 19x9 in teaching physics and mathematics 
in public schools. It cannot be said that school teaching was congenial to him, but 
the few years spent in such work was far from wasted. Experience in teaching physics 
supplied what Fisher has always regarded as the most serious omission of his studies at 
Cambridge, namely, actual touch with problems of practical experimentation. On the 
whole therefore it was probably fortunate that the oi>portunity of independent statistical 
work came at a time when his outlook was more mature. 

Rothamsted Experimental Station. 

At the close of the War he was offered the post of the chief statistician under Karl 
Pearson at the Galton Eaboratory. At about the same time Sir John Russell, Director of the 
Rothamsted Agricultural Station, offered him the post of statistician with opportunities 
for building up a Statistical Eaboratory at Rothamsted. Neither post had much financial 
attraction, but he unhesitatingly accepted the Rothamsted offer as he thought that 
facilities for independent research would be greater there. 

1. Biometrika, x, 507-521 (1915). 

2. \V. S. Gos««et, whose untimely death in November, 1937, took away a great pioneer in statistics. 

3. Metron, i, part 4, 1-32. 


266 



At Rothamsted Fisher found his real vocation for life, and there followed a series 
of years of intense activity. On the theoretical side he continued his work on sampliiif? 
distributions and in 1921 completed the classical memoir on “Mathematical Foundations of 
Theoretical Statistics”^* which was intended to supply the framework for modern statistical 
theory. On the applied side the method for separating the slow changes in time series was 
given in the same year,* and a little later in 1924, the important memoir on the influence of 
rainfall on the yield of wheat.® In 1923 was published the first paper on field trials* 
which led to a revolution in the technique of agricultural trials throughout the world, and 
was the starting point of the work on the design of experiments. The z-test of significance 
and the arithmetical procedure known as the ‘Analysis of Variance’ took their present 
form at about the same time. 

“Statistical Methods for Research Workers” was first published in 1925, and has 
since then run to six editions. This book has probably done more than anything else 
to make research workers in most diverse fields of study familiar with the practical 
applications of modern statistical methods, and to create a statistical attitude of mind 
among the younger generation of scientists. 

During the next few years we have a number of important papers on exact .sampling 
distributions, the theory of estimations, the Chi-square measure of contingency, Bayes’ 
theorem and inverse probability on the mathetnatical side, and the arrangement of field 
experiments on the applied side. 

Genetical Theory of Naturae Selection. 

Fisher had been working on Mendelism and genetics for a long time and occasional 
papers had been published since 1918, but from about 1925 this subject began to engage 
increasing attention. The book on “Genetical Theory of Natural Selection” was published 
in 1930, and constituted a landmark in the history of the subject. A mathematical theory 
was developed on the basis of recent genetical researches to establish the principle of 
Natural Selection, on a more rigorous basis than Darwin had claimed, as the effective cause 
of evolutionary change. This principle is considered, not merely as a qualitative hypothesis 
in connexion with the classes of facts which it is capable of explaining, but rather in the 
manner of theoretical physics, as an agent which is bound to cause definite changes at 
a calculable rate. Factors recognized by Darwin, such as sexual selection, can then be 
examined on a quantitative rather than a speculative basis ; and phenomena unknown to 
Darwin, such as genetical ‘dominance’ and the selection of fertility in human societies are 
found to illustrate unexpected effects of the evolutionary mechanism. Like Clerk-Ma.vwell 
who translated Faraday’s concepts into mathematical language and developed the electro¬ 
magnetic theory of light, Fisher gave a ejuantitative form to Darwin’s views and built up 
a statistical theory of evolution. 

Acauemic Honours. 

The course of events in his life was in the meantime following a normal channel. 
He was married in 1917 to Ruth Eileen, daughter of H. Grattan Guinness, M.D., and 
several children were born to them. In 1920 he was elected a Fellow of Gonville and 
Cains College, Cambridge, and was awarded the Sc.D. degree of the same University 
in 1926. The influence of his work was in the meantime spreading rapidly, and he was 
elected a Fellow of the Royal Society in 1929. and was awarded the Weldon Medal in the 
same year. He was elected Honorary Fellow of the American Statistical Association in 
1930, and Foreign Member of the American Academy of Arts and Science in 1934. 

Galton Professor in London. 

Fisher was appointed Galton Professor in the University of London on the retirement 
of Karl Pearson in 1933. He has always been interested in eugenics, and has been 

4, Jour. Agri. Sc. xi, 107-135. 6. Jour. Agri. Sc. xiii, 311*320. 

5. Phil. Trans. Roy. Soc. B. ccxiii, 89-142. 


267 



for a long time associated with the Kiigeuics Society first as Honorary Secretary and later 
as Vice-President. Besides the importance of his work in genetics it was therefore pecu¬ 
liarly appropriate that he should succeed to the Chair created by an endowment from the 
great founder of the science of eugenics. He also took over from Karl Pearson the editorial 
charge of the Annals of Eugenics from 1933. During the last few years Fisher has 
devoted a great deal of attention to the application of statistical theory to genetic research 
and a large number of iiapers have been published in this subject. 

The methods first developed in connexion with agricultural experiments have found 
in recent years increasing application in industry, medicine and public health, education 
and psychology, and scientific experiments of all kinds. In 1935 Fisher published a book 
on “The Design of Experiments” in which the theoretical principles were develoi>ed with 
a great variety of illustrative examples. 

In 1936 Fisher visited the United States and received an honorary degree from the 
Harvard University on the occasion of its Tercentenary celebrations. Karly in 1937 he 
accepted the Honorary Fellowship of the Indian Statistical Institute, and the invitation eo 
preside over the first session of the Indian Statistical Conference in January, 1938. 

Main Currents of Fi.sher’s Work. 

Fisher’s work falls naturally into three main streams:—(a) contributions to the 
mathematical theory of statistics ; (b) application of statistical theory to agriculture and 
the design of experiments ; and (c) contributions to genetics. 

On the mathematical side, for the first time, Fisher has supplied a unified, and general 
theory for drawing rigorous conclusions from statistical data. Tlie logical and philosophi¬ 
cal conscEiiienccs of Fisher’s theory still require to be fully worked out. Its importance 
in the theory of knowledge can be easily appreciated from the claim that it supplies the 
only rigorous logical foundation for all inductive inferences in science. 

The theory of design of exi^erirnenls is intended to supply an adequate technique for 
collecting the primary data in such a way that valid inferences may be drawn from them, 
and for extracting the ma.ximum amount of information contained in the data in the most 
efficient way. 

The object of the statistical theory of evolution is to supply a quantitative and 
mathematical basis for biology in general, and eugenics, the science of man, in particular. 

Tiik Tiikoky of Sampling Distributions. 

A brief account is given below of Fisher’s work on the mathematical theory of statistics 
to indicate its importance in recent developments in statistics. 

The idea of the random sampling distribution of statistics is fundamental in modern 
statistical theory. The problem of finding such distributions is, however, one of great 
mathematical difficulty and very little progress was made until Fisher started working on 
the subject. The earliest example of the modern type of distribution was that of the 
Chi-square found by Karl Pearson in 1900. Several years later “Student” gave the 
correct distribution of the sample variance and of his now famous t-statistic or the mean 
divided by its estimated standard deviation, 

I have already mentioned however that Fisher was not acquainted w’ith “Student’s” 
work when he wrote his 1914 paper on the exact distribution of the correlation coefficient. 
Here he introduced for the first time the brilliant technique of representing a sample of 
size n. by a point in a space of n-dimensions. Such representation has proved extremely 
useful in subsequent work not only in the theory of distribution, but also in other fields 
of statistical theory such as the work of J. Neyman and E- S. Pearson. The same repre¬ 
sentation was used by Fisher in finding the exact distribution of the mean error and the 
mean square error in 1921/ the regression coefficient in 1922,* the partial correlation 
coefficient in 1924,® and the coefficient of multiple correlation in 1928.*® 

7 . Metron, i, Part 4, 1-32. 

9. Meiron, iii, 329-332. 


^ 8. Jour. Roy. Slat. Soc. bexv, 597-612. 
►10. Proc. Roy. Soc. A, exxi, 654-673. 



In 1923” he gave a rigorous proof of “Student’s** result for the t-statistic» and a little 
later showed how it could be used for testing various statistical hypotheses.** In 1924 
Fisher generalized “Student’s*’ t to the well-known z-statistic or half the logarithm of the 
ratio of two estimated variances based on different degrees of freedom, and w^s able, 
with the help of this distribution, to give a unified treatment of practically all the 
imix)rtant distributions involved in testing null hypotheses.** The distribution of the 
discriminant function** recently introduced by him to test the significance of the difference, 
as a whole, of two sets of means of samples drawn from a normal multi-variate correlated 
population, belongs to the same general class. 

The work on the limiting forms of frequency distributions of the largest or smallest 
member of a sample,*'* of the error of an interpolated value,** and the exact distributions 
which arise in connexion with various tests of significance in harmonic analysis*' may also 
be mentioned here. 


Theory of Hstimation and Statistical Inference. 

In science we are always faced with the problem of arguing from the particular to 
the general, or in statistical language, from the sample to the population. The task of 
statistical estimation is to find, on the basis of an observed sample, the values of the 
unknown parameters of the population from which the sample has been derived. Fierce 
controversy has raged over this subject ever since the publication in 1763 of Bayes* 
posthumous memoir “An essay towards solving a problem in the doctrine of chances” 
{Phil. Trans, liii, p. 370) in which he proposed to solve this problem with the help of the 
principle of equal distribution of ignorance. After more than one century and a half new 
light was shed on this problem by Fisher in his remarkable memoir “On the Mathematical 
Foundations of Theoretical Statistics.** This paper laid the foundations of statistical 
inference by emphasizing the importance of exact solutions of sampling problems, and 
by supplying the elements of a valid theory of estimation. 

Any statistic** which estimates a certain parameter must of course satisfy the criterion 
of ‘consistency*, that is, its value must tend to the estimated parameter as the sample 
size is indefinitely increased. Fisher however gives a second criterion, namely that of 
‘efficiency*, which requires that the variance of the estimating statistic (at least for large 
samples) should not exceed that of any other consistent statistic estimating the same 
parameter. 

This idea had occurred in an earlier paper of 1920, but a great achievement of the 1921 
memoir consisted in providing a machinery for cMculating such efficient statistics, in the 
abser c "' of which we would have been compelled to grope with different consistent statistics, 
and to compare their variances. This machinery is supplied by Fisher’s now well-known 
method of ‘maximum likelihood*, a maximum likelihood statistic being always an efficient 
solution. 

In the same memoir Fisher introduces for the first time the idea of a ‘sufficient’ statistic 
T, which is inevitably also efficient, and which incorporates the whole of the information 
available in the sample in regard to a given parameter, in the sense that if T' is any other 
statistic estimating the same parameter, the joint distribution of T and T^ is such that, for 
a given value of T, the distribution of T' does not involve the parameter estimated. When 
a sufficient statistic exists, it can always be obtained bji the method of maximum likeli¬ 
hood. Many interesting applications of these ideas were given in the same paper, such 


IJ. Proc. Camb. Phil. Soc. xxt, 655-658. 12. Proc. Jnt. Math. Congress, Toronto i92i, 

xifa. Econometrica, iii, 353-365 (1935). '14. Annals of Eugenics, vii (1938), 179-188. 

15. Proc. Camb. Phil. Soc. xxiv (1928), 180-190. 16. Proc. Camb. Phil. Soc. xxiii (1927), 912-921. 

17. Proc. Roy. Soc. A, exxv (1929), 54-59. • 18. Phil. Trans. Roy. Soc. A, ccxxii, 309-368. 

19. The word “statistic” in the singular number has been reserved by Fisher to denote a quantity 
which is capable of being expressed only in terms of actual observations in the sample. This 
concept was originally introduced by “Student”, and has been given especial emphasis in 
Fisher’s theory. 


269 



as errors of grouping, inefficiency of estimation of parameters by th^ method of moments 
in Pearson’s system of frequency curves, and the estimation of the number of micro¬ 
organisms in a sample of water or soil. 

The concept of ’information’, first tentatively introduced in the 1921 paper, was more 
fully developed in a later paper*® in 1925, enabling the theory of estimation to he freed 
from the large sample assumption. The concept of ancillary statistics, which serve to 
enhance the precision of estimation in the case when no sufficient statistic exists, was also 
introduced. In 1934 the general method of finding the distribution of any sufficient statistic 
was indicated,*’ and it was shown that in certain cases when no sufficient statistics exists, 
the whole of the information contained in the sample may be recovered by using as 
ancillary what may be termed the configuration of the sample. The important properties 
of ‘information’, as technically defined by Fisher, and ‘the maximum likelihood estimate’, 
were clearly summarized in a paper in the same year,** and a revised proof of the two main 
theorems in the theory of estimation, together with illustrative examples on the use of 
ancillary statistics, were given in a paper in i935-*“ 

Another important contribution of Fisher to the theory of statistical inference is his 
idea of ‘fiducial probability* introduced in 1930, which is designed to cover the case when 
we do not want a single estimate of our unknown parameter, but re<iuire an interval in 
which our unknown parameter may be expc-cted to lie.** Fisher has laid emphasis on 
the fact that the concept of fiducial probability, though entirely different from that of 
ordinary probability, is equally rigorous ; and has brought out the importance of the 
fiducial concept in statistical theory in other papers.** 

The Chi-square test was perhaps Karl Pearson’s greatest single discovery ; but Fisher 
has done much to make its use popular. In 1922 he showed that in the case of contin¬ 
gency tables, for which the margins of the expected table are reconstructed from those 
of the observed table, the distribution of Chi-square as given by Pearson’s formula can 
he used validly if we take for n' a number exceeding the degrees of freedom by unity.*” 
This result was discussed in other papers and fully corroborated by material derived from 
random sampling experiments.*' Fisher also broitght the Chi-square test in relation with 
his general theory of statistical inference, showing in particular that the method of 
minimising Chi-square agrees for large samples with the method of maximum likelihood.*'* 
There are many other papers in mathematical statistics** among which the most 
important are probably those on the use of combinatorial methods for calculating moment 
coefficients,*® and on the number of reduced I,atin Squares of different orders.** 

Statiswcai, Theory in Agriculture. 

Fisher’s influence in India has been of the greatest importance in connexion with 
the application of statistical theory to agriculture. The basic principles of the new method 
are now well-known and need not be discussed in detail. In order to appreciate the 
revolutionary advance brought about by the introduction of the new technique, let us, 
however, consider for a moment the contrast between experiments of the old and the new 
type. 

20. Proc. Comb. Phil. Soc. xxii, 700-725. 21. Proc. Roy. Soc. A, cadiv, 285-307. 

Proc. Roy. Soc. A, cxlvi, 1-8. 23. Jour. Roy. Slot. Soc. xcviii, 30-82. 

24. Proc. Camb. Phil. Soc. xxvi, 528-535. 

25. Proc. Camb. Phil. Soc. xxviii (1932), 257-261; Proc. Roy. See cxxxix (1933) 343-348; Annals 
Eugenics, vi (1935), 391-898. 

26. Jour. Roy. Slat. Soc. Ixxv, 87-04. 

27. Ecotvomlca, iii (1923), 139-147; Eugenics Review, xviii (1926), 32-33. 

-88; Ref. nos. (18) and (12) and Bologna. Atti del Congresso Int. dei Matematici, vi (1928), 94-100. 
20. Among other miscellaneous work may be mentioned the method for testing the randomnesa. of 
a sequence {Quar. Jour. Met. Soc. lylT, 250, 1926), and the properties and applications of the 
integrals and derivatives of the normal error function (Brit. Assoc. Mathematical Tables. Vol. 1), 

30. Proc. Land. Math. Soc. XXX. 190-238 (1929); Proc. Roy. Soc. A, exxx, 16-28 (1930). Proc. Lond. 
Math. Soc. xxxiii,. 195-208 (1930). 

31. Proc. Camb. Phil. Soc. xxx (1934), 492-507. 


270 



Suppose we wish to compare the yield of, say, six varieties or the effect on yield 
of six kinds of manures. In the old type of experiment the field would be divided into 
six plots, and a single plot would be allotted to each treatment. As Fisher explains, 
*‘the treatment giving the highest yield would of course appear to be the best, but no 
one could say whether the plot would not in fact have yielded as well under some or all 
of the other treatments. It is known that within the same field wide differences exist 
in the fertility of the soil. Even when the soil fertility is uniform, there are innumerable 
other causes which affect the yield. How can we be sure that the observed differences 
in yield are due to the difference in the treatments, and not to soil heterogeneity? How 
can we be sure that they are not due to chance fluctuations? This is the basic problem. 
In order to solve it we must eliminate the effect of soil heterogeneity, and make an 

unbiased estimate of the magnitude of errors due to chance so that we may be sure that 

the observed effect is significant in comparison with the size of such chance errors. 

Let us now see how Fisher solved the problem. Consider the- same experimental 
field which had been originally divided into six portions. Fisher simply further sub¬ 
divided each portion’’ into a number of plots of smaller size ; and within each portion 

(or block as he called it) he assigned one plot to each treatment but strictly in a random 

manner. We have now the randomi/.ed block in its modern form. Using the principle 

of block division in two directions symmetrically we get the well-known Latin square. 

The important point to be noticed is that the results will be now governed entirely 
by the laws of chance. There are innumerable causes which produce differences between 
the plots, and we know from the conditions of the experiment that it is impossible in 

practice to secure that the plots will be all alike. But the validity of the estimate of 

error is now guaranteed by the process of randomization, namely “the provision that 
any two plots, not in the same block, shall have the same probability of being treated 
alike, and the same probability of being treated differently in each of the ways in which 
this is possible.” The calculus of probability and the apparatus of the statistical theory 
of sampling distributions can be now used with complete confidence. The logical founda¬ 
tions of scientific inference were thus made secure, and agricultural experiments were 
placed for the first time on the satne footing as experiments in other sciences. 

The second point to be observed is that by the techni(]ue of block division the 
problem of soil heterogeneity was solved at the same time. As each block contains all 
the treatments, differences between the total yields of the different blocks could be safely 
ascribed, apart from errors of sampling, to soil differences ; and could be eliminated by 
suitable statistical methods. This of course led to a great improvement in the precision 
of the comparisons. When we remember that in particular experiments in India as much 
as ninety per cent, of the total variation is sometimes caused by soil differences, the 
importance of eliminatnig the effect due to such differences will be easily api)reciated. 

Replication, randomization and block division or local control are thus the 
fundamental principles of design introduced by Fisher. Replication is essential 
because it is the sole source of the estimate of error, while randomization is 
necessary to guarantee the validity of the estimate, that is, to ensure that the estimate 
will be unbiased. The purpose of block division is to increase the precision of the 
comparisons by elimination of soil differences, while replication is also useful in securing 
the same object by diminishing the experimental error. Finally, the analysis of variance 
gives a convenient and valid method of extracting the information contained in the 
observations. As Wishart has pointed out, the Fisherian technique “was something in 
the nature of a revolution,” and altered the subsequent course of agricultural experiments 
throughout the world. 


32. The Design of Experiments. 1935, p. 69. 

33 . I need scarcely add that the experimental field may be divided into any number of convenient 
portions each of which is further sub-divided into a number of plots. 


271 




The Introduction of the Fisherian Technique in India. 

The modern period of field experiments began in India with the foundation of the 
Imperial Council of Agricultural Research in 1929, which, from its very inception, 
laid great emphasis on the use of statistical methods, created a statistical section at its 
headquarters, and gave a grant to the Statistical lyaboratory, Calcutta, for work in this 
connexion. The earliest analysis of an experiment of the Latin square type was published 
from the Laboratory in 193T. The first complex (variety-manurial) experiment in 1932, 
the first split-plot experiment in 1933, and the earliest experiment using the principle of 
‘confounding’ in 1935 were all designed and introduced from the same Laboratory. The 
use of the new technique has spread rapidly, and no important experiment in India is now 
laid out on a design of the old type. The ultimate beneficial effect of this movement 
on Indian agriculture is difficult to estimate or exaggerate. 


Genetic Studie.s. 

Fisher’s method of maximum likelihood is now well-known and is being increasingly 
adopted in genetical work in India. More important than these contributions to ‘formal 
questions’ as Fisher himself calls it are his contributions to the problems of (juantita- 
tive characters which depend on a large numlx;r of mendelian factors of small individual 
effects. Almost all economic characters in which the animal or ])lant breeder is interested 
are of this nature and here the usual genetic analysis comidetely breaks down. If, 
therefore, genetics is to exert its influence on breeding, geneticists and breeders must 
concentrate on the study of quantitative characters. Fisher has shown how statistics of 
the .second and third degree can be utilised for this purpose, and has given methods for 
the estimation of yarious genetic quantities from which genetic variances can be calculated 
and the effects of dominance and environmental causes eliminated. 

A knowledge of genetic variance and of the possible number of factors involved in .a 
character is of fundamental importance to the breeder in assessing the selection potentiality 
of his material. Fisher has shown the way to attain this knowledge. The experimental 
study, however, requires a sound lay out and elaborate recording of observational data 
to make statistical analysis possible. Work has recently been started on cotton at the 
Institute of Plant Industry at Indore on the new' lines, and it may be confidently asserted 
that the influence of Fisher will ere long become increasingly important in this field of 
study also. 


The Influence of Fisher on Statistical Work in India. 

In recent developments in statistics in India the mo.st powerful influence has been 
that of R. A. Fisher. The movement for the study of analytic statistics started in this 
country at a time when Fisher’s work was just coming into prominence, and the younger 
workers have been all brought up in what may be called the Fisherian tradition. The 
methods developed by him are being used not only in agriculture, but also in all kinds of 
investigations of importance to national welfare. On the theoretical side, what little is 
being done in India is based largely on his work. He has been the accepted and acknow¬ 
ledged leader for a long time, but his presence here as the first President of the Indian 
Statistical Conference has established personal relations which arc highly cherished by all 
statistical workers in India.* 


Calcutta, 

6th January, ig^S. 


P. C. Mahalanobis. 


* Written on the occasion of the opening of tlie First Session of the Indian Stastistical Conference 
presided over by Prof. R. A. Fisher in Calcutta on the 7th January, 1938. 


272 




CONTENl'S 


Paper 1. 
Paper 2. 

Paper 

Paper 4- 

Paper 5. 

^ Paper G. 

V' Paper 7. 

'* Paper 8. 

V Paper 9. 

Paper 10. 

Paper 11. 
Paper 12. 

Paper IS. 

Paper 14 . 

Paper 16. 

'^Paper IG. 
^ Paper 
Paper 

Paper 19. 
Paper 20. 

Paper 21. 

\^^Paper 22. 
Paper 23. 


On the “Probable Error” of a Coefficient of Correlation 
Deduced from a Small Sample 

A Mathematical Examination of the Methcxls of Deter¬ 
mining the Accuracy of an Observation by the Mean 
I^rror, and by the Mean Sc^uare Error 
Studies in Crop Variation. I. An Examination of the 
Yield of Dressed Grain from Broadbalk 
The Accuracy of the Plating Method of Estimating the 
Density of Bacterial Populations 

On the Interpretation of Chi Square from Contingency 
Tables, and the Calculation of P 

The Goodness of Fit of liegression Formulae and the Dis¬ 
tribution of Regression Coefficients 

Statistical Tests of Agreement between Observation and 
Hypothesis 

The Conditions under which Chi Square Measures the 
Discrepancy between Observation and Hypothesis 
On a Property Connecting the Chi-Square Measure of 
Discrepancy with the Method of Maximum Likelihood 
On the Mathematical Foundations of Theoretical Statis¬ 
tics 

Theory of Statistical Estimation - 

On a Distribution Yielding the Error Functions of Sev¬ 
eral Well Known Statistics 

The Mathematical Distributions Used in the Common 
Tests of Significance 

The General Sampling Distribution of the Multiple Cor¬ 
relation Ccx^fficient 

Limiting Forms of the Frequency Distribution of the 
Largest or Smallest Member of a Sample 
Tests of Significance in Harmonic Analysis 
The Arrangement of Field Experiments 
Studies in Crop Variation. VI. Expe^riments on the Re¬ 
sponse of the Potato to Potash and Nitrogen 
The Distribution of Gene Ratios for Ran* Mutations 
Moments and Product Moments of Sampling Distribu¬ 
tions 

The Moments of the Distribution for Normal Samples of 
Measures of Departure from Normality 
Inverse Probability 

The Sampling Error of Estimated Deviates, together with 
Other Illustrations of the Properties and Applications of 
the Integrals and Derivatives of the Normal Error Func¬ 
tion 



‘ Paper 2^. 
^Paper 25. 
V Paper 26. 
Paper 27. 
Paper 28. 

^ Paper 29. 
Paper 30. 

Paper 31. 
Paper 32. 

' ^ Paper 33. 
Paper 34- 
Paper 35. 

Paper 36. 

Paper 37. 

^Paper 38. 
\^Paper 39. 

^^yPaper 40 . 
Paper 4P 


Paper 4^- 
Paper 43. 


Two New Properties of Mathematical Likelihood 
The Fiducial Argument in Statistical Inference 
The Logic of Inductive Inference 
Uncertain Inference 

A Test of the Supposed Precision of Systematic Arrange¬ 
ments 

Professor Karl Pearson and the Method of Moments 
Moments and Cumulants in the Specification of Distri¬ 
butions 

The Wave of Advance of Advantageous Genes 
The Use of Multiple Measurements in Taxonomic Prob¬ 
lems 

The Statistical Utilization of Multiple Measurements 
The Precision of Discriminant Functions 
The Comparison of Samples with Possibly Unecjual Vari¬ 
ances 

The Sampling Distribution of Some Statistics Obtained 
from Non-linear Equations 

On the Similarity of the Distributions Found for the Test 
of Significance in Harmonic Analysis, and in Stevenses 
Problem in Geometrical Probability 
The Negative Binomial Distribution 

The Theory of Confounding in Factorial Experiments in 
Relation to the Theory of Groups 

A System of Confounding for Factors with More than 
Two Alternatives, Giving Completely Orthogonal Cubes 
and Higher Powers 

Some Combinatorial Theorems and Enumerations Con¬ 
nected with the Numbers of Diagonal Types of a Latin 
Square 

The Likelihood Solution of a Problem in Compounded 
Probabilities 

The Relation between the Number of Species and the 
Numl>er of Individuals in a Random Sample of an Animal 
Population 



1 


ON THE “PROBABLE ERROR” OF A COEFFI¬ 
CIENT OF CORRELATION DEDUCED FROM A 
SMALL SAMPLE 


AUTHOR’S NOTE 

This is the second of three papers dealing with the sampling errors of 
correlation coefficients cov’^ering the cases (i) “The frequency dis¬ 
tribution of the values of the correlation coefficient in samples from 
an indefinitely large population/^ Biometrika, Vol. 10, pp. 507-521, 
1915. 

Here the method of defining the sample by the coordinates of a 
point in Euclidean hyperspace was introduced, and it was shown 
that the exact sampling distribution could be obtained. The prac¬ 
tical application of these results appears in the second paper in 1921, 
here referred to. It was concernetl primarily to show that the sam¬ 
pling distribution was different for intraclass and interclass correla¬ 
tions, and to give the exact solution for the former, comparable with 
that given for the latter in 1915. The special simplicity of the solu¬ 
tion in the intraclass case was one of the foundations of the recogni¬ 
tion of the 2 -distribution, and the analysis of variance, in terms of 
which it would now be treated. 

The third of these papers appeared in 1924, also in Metron: (Hi) 
“The distribution of the partial correlation coefficient,” Metron^ 
Vol. 3, pp. 329-332, 1924. 

It shows that the effect of the elimination of variates by partial 
correlation is simply to reduce the effective size of the sample by 
unity for each independent variate eliminated. 

This group of three papers is part of a larger series appearing from 
1915 to 1928, in which exact solutions were found for a variety of 
problems of distribution, and the corresponding tests of significance 
developed. Although many of these problems had been approached 

* This paper appeared in Metron^ Vol. 1, No. 4, pp. 3-32, 1921. The established 
policy of Metron does not permit the republication of articles from that journal. 



1.2b 


using statistics called correlation coefficients (e.g., Biserial r, Bi¬ 
serial etc.), yet it appears that the term was widely misapplied, 
and the problems themselves are now simply treated as comparisons 
of means, tests of heterogeneity, or regression problems. 

In the final section this paper contains a discussion of the bearing 
of the new exact solutions of distribution problems on the nature of 
inductive inference. This is of interest for comparison with the 
paper on “Inverse probability,’* published 1930, nine years later. 
In 1930 the notion of fiducial probability is first introduced using as 
example the distribution found in this paper. In view of this later 
development the statement, “We can know nothing of the proba¬ 
bility of hypotheses or hypothetical quantities,” is seen to be hasty 
and erroneous, in the light of the different type of argument later 
developed. It will be understood, however, as referring only to the 
Bayesian probabilities a posteriori. 



2.757a 


2 

A MATHEMATICAL EXAMINATION OF THE 
METHODS OF DETERMINING THE ACCU¬ 
RACY OF AN OBSERVATION BY THE MEAN 
ERROR, AND BY THE MEAN SQUARE 
ERROR 

AUTHOR’S NOTE 

This early paper arose from an examination of a statement by 
A. S. Eddington in his book, Stellar Movements, page 147. It con¬ 
cerns the relative precision of two estimates of the variance of a nor¬ 
mal distribution: (a) using Bessers formula, based on the mean 
square; and (6) using Peter’s formula, using the mean deviation. 

The results include the exact sampling distribution of Bessel’s es¬ 
timate and the mean square of Peter’s. The variance of the latter 
is, in large samples, the larger in the ratio (ir—2). 

Next is considered the class of estimates bascnl on powers of the 
deviation in general, showing that the precision is maximised when 
p = 2, the variance being 14 per cent greater for p == 1 and 9 per 
cent greater for p == 3. For continuous variation of p the precision 
of the mean square is a true maximum. These results had all been 
obtained before, without the knowledge of the author or of Eddington. 

The most important point of the paper is the consideration of the 
simultaneous distribution of two estimates. This is examined in 
detail for the case of four observations, but the more general point 
is established that for a given value of 0-2 the distribution of cri is in¬ 
dependent of O’. Consequently when a 2 , the estimate based on the 
mean square, is known, a value of o-i, the estimate based on the mean 
deviation, gives no additional information as to the true value. It is 
shown that the same proposition is true if any other estimate is sub¬ 
stituted for O-I, and consequently that the whole of the information 
respecting the variance which a sample provides is summed up in 
the single estimate 0-2 - I believe this is the first occasion on w'hich 
attention has been called to this property characteristic of a sufficient 
estimate. 

* Reprinted from Monthly Notices of the Royal Astronomical S€>cietyt Vol. 
LXXX, No. 8, pp. 768-770, 1920. 



2.758 


A Mathematical Examinaiioii of the Methods of determining the 
Accuracy of an Obaervation by the Mean Error^ and by the 
Mean Square Error, liy R. A. Fisher, M.A. 

1. In estimating the precision of a number of observations 
two methods are in common use: that of the Mean Square Error, 
and that of the Mean Error. It is, I believe, usually admitted 
that the former has the firmer mathematical basis, although it is 
sometimes asserted that the latter is more accurate. It is not 
generally recognised that the merits of the two methods may be 
compared with precision. The case is of interest in itself, and it 
will be fount! that the method here outlined is illuminating in all 
similar cases, where the same quantity may be ascertained by more 
than one statistical formula. 

Suppo.so the probability distribution of each observation to be 
centred about a true mean m, with normal distribution and 
standard deviation tr, so that the chance of any observation falling 
in the range dx is 

dx, 

^ •J ITT 

The unknown, cr, is to be determined from n observations, 
» *^ 2 » » • • •» 

nj!-R(x), 

and 

*= 8(x — £E*)*, 

then cTj is the value obtained by the method of Mean Error, o-g 
that obtained by the method of Mean Square Error, and <r the 
true value. Both cr^ and ar^ may be adjusted by means of appro¬ 
priate functions of n so as to make the mean value of each of them 
obtained from a number of samples agree with the true value, but 
this for the moment is immaterial. 

2 . The distribution of o-j. 

I have described elsewhere (“Frequency Distribution of the 
Values of the Correlation Coetlicient in .Samples from an In¬ 
definitely Large Population,” Biometrika, 10, 507 ) a method 
by which the frequency distribution of may be established. 
If 2 ;^, . . ., are co-ordinates in generalised space of n 

dimensions, then any sample is represented by a single point 
having the observed values as co-ordinates. Let O be the origin 
and O the point at which every observed value is equal to in ; 
then at any point along the line OC, produced indefinitely in both 
directions, all the co-ordinates are equal. 

Let the point P represent the sample; from P draw PM 
perpendicular to OC, then it is easy to see that M is the point, 

./*! =■ 2*2 2 'g =• , . , =s ibj 



2.759 

Determining the Accuracy of an Observation. 759 
and that PM^ ji^ 

— x)^ — n€r^. 

Now, since the chance of any observation falling in the range dx is 


211 


dx. 




the chance of the sample falling in the space dx^ , dx ^, 

but 

S(a: - m)“ — n{x — m)^ -+- 

and the element of volume is evidently proportional to 

HLf; • 

so that the chance of a sample falling in the range dx dfr^ is pro¬ 
portional to 

n(*—wi)* n<r,* 

e dcr^. 

The distributions of and are therefore perfectly inde¬ 
pendent ; the chance of x falling in the range dx is 

n(il—m)* 


Vn 

T\/ ZH 


d&. 


and the chance of a-^ falling in the range dar.^ is 

_ ^ 

2*(<*-3I . " 3 ^ j •» 

We may note at once that the mean value of cr.^ is 

I 


r. (If^) 
V. 


while the mean value of cto^ is 

n— 1 


As n is increased the curve rapidly tends to the normal form ; 
the mean is approximately 




4n/ 

* The symbol x ! is here usetl in a sense equivalent to n(.r) or r(x+ i), 
irbether x is an integer or not. 



2.760 

760 Mr. R. A, FiaheTt Methods of 

whence it is easy to see that the standard error of <r, is 

or 

y/ 2n 

If it were desired to bring the mean into coincidence with 
the true value cr, the value of obtained should evidently be 
multiplied by 



This correction, if it be so considered, is never of importance ; 

when n is large the value ^ is of higher order than the standard 
4n 

error. Even when n™2, and the correction amounts to an in¬ 
crease of 35 per cent., the standard error is much greater, being 
75 per cent, of the mean. 

3. Standard error of o-j. 

If the co-ordinates Xi, are the deviations of in¬ 

dividuals of a sample from its mean, the representative j^oint lies 
in a plane space of (n— i) dimensions, in which 

S<;r) = o. 

The frequency density at any distance, r, from the origin is 
proportional to 

r« 

and 

r*-S(x*). 

The region in which any co-ordinate has an assigned value, 
is a plane space of (ra — 3) dimensions, at a distance 



from the origin, and the frequency with which Xj falls into the 
range dx^ is therefore proportional to 

_f«,* 

Thus deviations from the mean of samples of of a normal 
population are themselves normally distributed. The deviations 
from the mean of the population are, however, independent, while 
deviations from the mean of the same sample are not. Consider 
the distribution of pairs of values, Xj and Xq. 

The space in which the representative points lie is parallel to 
the line 

X, -I- Xj = o 

• • • -a^n-o. 



2.761 

Determining the Accuracy of an Observation. 761 


while it raakue with the line 


an angle the cosine of which is 


Consequently the frequency in the range is pro¬ 

portional to 

e a<r»l a n-a * a * 

acn-a>r*l * +n-,^ * i dx^dx^, 

showing a surface of normal correlation, with correlation co¬ 
efficient, -L- , between any two deviations. 

w — 1 

If the deviation he considered without reference to sign, each 
is distributed from o to 00 with frequency, 




and each pair with frequency, 

ir&^<Jn{n ^2) {n — z)a 

The mean value of x is therefore 


/Ixydx^. 


Vi- 


BO that the mean value of o-j is 


as is generally known. 

The mean value of x* is 


V n - 1 


and that of .ry is 


^ — 2) + sill”"'- , 


whence it follows that the mean value of o-j* is 


+ >/«(?* - 2) + sin"' —L. \ . 
nr \ 2 n \ / 



2.762 

762 


Mr. R. A. Fisher, Methods of 


As in tlie case of o-g, Uie mean diflera from the true value 

by a term of the order - . When n is large this is insignificant, 
n 

for the mean value of is approximately 

hence the standard error is _ 

<T /tt — a 
2 

As n is made large, therefore the standard error of o-j tends to 
bear a constant ratio to that of 0-2. The former is the larger in 
the ratio \/ir — z ; in other words, the value of the standard 
deviation, or probable error, obtained from the Mean Square 
Deviation of a sam})le has gi^iater weight by 14 per cent, than that 
obtained from the Mean Deviation. To obtain a result of equal 
accuracy by the latter method, the number of observations must 
be increased by 14 per cent.* 

4. The denvate of minimum error. 

The correlation between deviations, taken positive, being of 
order —,, does not, as we have seen, affect the expression for the 
variance of a derivate, when n is largo, because this is of order 
We may, then, in examining the comparative variance of 

different derivates ignore this correlation, and treat the deviations 
as though they were independent. The term “variance” is used 
here as elsewhere to signify the square of the standard deviation 
orstandaril error ; by the “relative variance” is intended the same 
quantity divided l)y the square of the mean. 

It is easy to verify that, although the variance is diminished 
as we pass from the derivate of the first power to that of the second, 
it is increased as we pass from the second power to the third. It 
is, therefore, of interest to determine for what power it is actually 
a minimum. 

If is the mean value of a;*’, then 

._ 

* Mr. Fisher kindly allows me to correct here an erroneous statement in 
my book', Stellar Movements^ p. 147, fuulnute. I think it accords with the 
general exporinnee of ustronomers that, for the errors commonly occurring in 
priKjtice, the mean error is a hufer criterion of accuracy than the mean square 
error, especially if any doubtful observations have been rejected; but I was 
wrong in claiming a theoretical advantage for the moan error in the case 
of a truly Gaussian distribution. My formulae were somewhat different from 
Mr. Fisher’s, since I considered the deviations of <r.j, from <r instead of from 
^2 ; hut, as he jioints out, this correction (as I considered it) is of minor 
importance, and my mistake arose in the numerical evaluation of the results. 
—A. S. Eddington. 



Determining the Accuracy of an Observation, 763 


if, then, oTp ia the derivate of the //•' power, we take as definition 


fTp 


V 


n/ 4 p ■ 


It is well known that the variance of the 7/'* moment of a sample 
of n is 


when n is large ; wh er oao it follows that the relative variance of 
o-p** is 




■> 


and, therefore, that the relative variance of o-p is 


I (Hip 

tipAfJL„^ 


( (p~h)\ , 


*1 ■ 


Putting p == 1, 2, and 3, we have ——- , -JL, and ii 2 I—?, being 

2n 2« 72« 

in the ratio 1*1416 : i : 10868. 

If the relative variance is a minimum for variations of jt), then 
must 


(J/'-i)!'' 


2 v/tt - 1 1 i ■ JaJT) '"‘5 ’ 




Now, when /> = 2, the factor outside the right-hand bracket 
reduces to 3, and the left-hand side of the equation is tlierefore 2. 
The right-hand bracket may be evaluated from the definite integral, 







writing successively ij and i for as in the bracket, there 
remains 

The equation is therefore satisfied, tlie relative variance having 
its njiniinuni vafue when p is 2. The mean square deviation is 
the derivate with the minimum relative variance. 

5. The distribution of pairs of values of o-, and rr^, when n-* 4. 

Full knowledge of the effects of using one rather than another 
of two derivates can only be obtained from the frequency surface 
of pairs of values of the two derivates. liy integration wdth 
respect to one derivate or the otlier, the two frequency curves can 
be obtained and compared, in respect i)f any quality >vhich may 



Mr. R, A. Fisher, Methods of 


2.764 


764 


bo in question ; but the additional information supplied by the 
mutual frequency surface is essential to a thorough examination 
of the question. 

When n is large the problem is siinplilied by the fact that both 
curves rapidly approach the normal form centred about the true 
value as mean ; the only possible difference between such curves 
is in the standard deviation. For small values of n the case is 
much more complicated ; failing a complete expression for the 
frequency surface- of o-j and cr^ in terms of », it will be best to 
investigate this surface in the single case, n*=4. This single case 
will be found sufficient to bring out the decisive features of the 
general surface. 

If jTj, and are the four observations, then the chance 

of all four observations falling into their respective elementary 
ranges is 

As in Article 2, this frequency density is the product of two factors, 
one depending only on x. 


wKag—m)* 


d.t, 


and the other on or^. 


{<r^/ 2tr)‘* 


dv. 


in which d'v stands for the element of volume in the plane three- 
dimensional space, 

H{x — ^) = o. 

Within this space the values of (x — Jb) will be positive or 
negative according as the representative point lies on one side or 
the other of four planes, through M, drawn parallel to the faces 
of a regular tetrahedron. The surface of a sphere with M as centre 
is therefore divided into 14 areas (all combinations of sign being 
possible except aH jyositire and aU negative). Of these areas 6 
are regular four-sided figures, including the cases in which two 
deviations are positive and two negative ; the remaining 8, regular 
three-sided figures, inclutle the cases in which one deviation is of 
opposite sign to the other three. All the sides of these figures 
are 60“ and the angles cos—* ± The fraction of the total area 
included in the four-sided figures is 

3 - cos-* If = '6490, 


and the remainder in the three-sided areas is 
^cos-* ‘35 *o- • 

The distribution of o-j is different in these tw'o groups of regions. 



2.765 

Determining the Accntacg of an Observation, 765 


When two of the deviations are positive and two negative, o-j 
will be constant on a plane space of the type, 

- - {-^4 - • 4*^1 • 

This space is at right angles to that which we have considered, 
namely 

S(aT-ie)-o, 

and its least distance from the centre therefore lies in that space. 

This distance is and passes through the centre of figure 

of the spherical (piadrangle in which the representative point lies. 
If $ be the angular distance of this point from the centre of figure 


i.e. 


20-0 COS 6 ~ fr 




Ia. 


In the triangular regions the plane space over which cr^ is 
constant, such as 

(./r, - . 1 ) + (.Tg - Jr) + (2:3 - .r) - {.r^ - .f) - ^ * . 4 ^^, 


is not perpendicular to the space in which the representative 
point lies, but makes with it an angle of 60**, the distance from 
the centre to the plane of intersection is therefore 


whence 



co.<»ec 60” 





3 ^ 


1 b. 


The freijuency distribution of o-j, for a given value of o-^, is 
thus reduced to the frequency of occurrence of dilferent values 
of Oj in two types of spherical figures. For the quadrangles the 
greatest possible value of & is 45*, while the least distance from 
the perimeter to the centre is siii"^ From the.se 6 regions 

we have 

From o to siii”^--^ fretpiency 3 sin $fW 
>/3 

From 8in“^ to 45** frequency 3sin^(i "^cos”’-* \w I 

v /3 ^ N 2 tan 6 '' I 




2.766 


Mr, R. A. Fisher, Methods of 


The greatest value of B in the triangles is sin~^—and the 

•J 3 

least distance from the perimeter to the centre is sin“' J, therefore 
from 8 such regions we have 


from o to 8 in“* ^ 


frequency 4 sin BdB 


from sin*"' \ to sin“^-y^ frequency 4 sin i — ^ cos”' —( 
J 3 \ n /8 tan ^ 

Substituting for $ in expression II., we obtain for the distri¬ 
bution of cr^ for a given value of cr^, 

from ctoa/- to fr, 2 \l— frequency 'i\l — 

“2 >3 ■ > W 0*2 

from <T„\I~ to frequency X\l ^ i — - cos~' 7 J^= == iSL = ^=^ -r ^ — 

^3 ^4 ^ 7r\ TT - 2 o-, 2 / ^^2 

for the quadrangles, anti for the triangles 

from o-oa/~ to o-oa/- frequency S\l ] 

> 8 V 3 y $Tr €r ,2 I 

from to frequency 8 ^i(. -1 J 

These two frequency curves are shown on opposite sides of the 
same base in fig. 1 . The existence of two or more distinct curves 
according to the partition of the observations by the mean, and the 
existence of one or more discontinuities are no doubt characteristic 
of such curves in general. For higher values of n, the right-hand 
side would no longer be truncated, but would meet the base with 
increasingly high contact. 

From the expressions III., may be obtained the frequency 
surface. For the fre<juency in the range being, by Article 2 , 

8 V '2 

Vtt * 

the frequency in the range dtr^ is 


V 2 0-2 ^3 ^ 

y 3 y 4 ^ 


'f. _ 4 cos- ) . 

^ »r a/tto-o* - 2 <r.v 




2.767 

Delerminifig the Accuracy of an Observation. 767 


when two observations lie on each side of the mean, and 
128 
3 ^n/3 


V 8 y 3 

y 3 ^2 y 4 


^ dtr^da-2 




"•V 3 


\ TT 


V 3w 


- 8cr, 




IVb. 


when three are on one side and one on the other. 


FREQUENCY DISTRIBUTION OF 



In expressions IV. we have the complete account of the facts 
of random sampling respecting the two variables cr^ and o-j ; it will 
be of interest first to obtain the frequency curves of o-j. 

6 . The distribution of otj. 

The expressions IV. may be integrated with respect to a^, over 
its whole range of variability from o to 00 ; by so doing we arrive 
at two freipieiicy curves corresponding to the two partitions of the 
observations, 

1 2 

TT 

an<l 


i,_4. t:*.'’ ■« I'-, 

r ^ ' .«> ■ /2 ^ , V 


\ <r 


Va. 


, _ 3 V ^3 

irV 3 ' ’>■ 


air,* fl g<r,*<* 
jtro* I 


dt 

+ 1J 


Vb. 










2.768 

768 Mr, R. A. Fisher, Methods of 

Of these curves it is easy to calculate the moments, and thence 
to find those of the compound curve obtained by throwing them 
together, as is done if we consider the distribution of <r■^ without 
regard for the manner in which the observations are parted by 
the mean. 

The moments of these three curves and of the corresponding 
curve for o-g are shown in the following table:— 



Vx. 

A. 

ff 

B. 

O’,. 

Total. 

O’,. 


3 . 6 “ 

2 




IT 

IT 




1 

2^3 

I 

3 

V! 




*(-.)• jJ; 




96v'3 + ''* 


V! 

IlJv* 

^(,r - 2a) f 

*7'/, \ ■*9 

Sr2(5'-7«) + 38^V5 

H 


a here stands for cos”^ J > of the numerical values wo need 
only cite the coefficient of variation, *4296 for <r^ and *4220 for 
o-j,. The derivate of the second power is less variable even for 
small values of n, but the difference in weight in favour of o-g is 
increased fourfold when n is made large. The curve for tr, has 
not only a larger coefficient of variation, it is also more skew, />', 
is '297 against 238 and is 3'28 against 311. 

7. Unique properties of ar^. 

So far the variables have been compared only in respect of 
the quantitative characters of their frequency distributions. There 
exists aLo in the form of the frequency surface (IV.) a ({ualitativo 
distinction, which reveals the unique character of o-g* 

From the manner in which the frequency surface has been 
derived, as in expressions 111., it is evident that: — 

Ftrr a given value of o-g, the distribution of or^ is independent 
of cr. 

On the other hand, it is clear from expressions (IV.) and (V.) 
that for a given value of o-j the distribution of o-g does involve <t. 
In other words, if, in seeking information as to the value of cr, we 
Brat determine otj , then wo can still further improve our estimate 
by determining-<rg ; but if we had first determined o-g, the fre¬ 
quency curve for ^r^ being entirely independent of o-, the actual 
value of 0*1 can give us no further information as to the value of cr. 
The whole of the information to be obtained from 0*1 is included 
in that supplied by a knowledge of o-g. 

This remarkable property of o-g, as the methods which we 
have used to determine the frequency surface demonstrate, follows 
from the distribution of frequency den.sity in concentric spheres 




2.769 

Determining the Accuracy of an Obeervation. 769 

over each of which is constant. It therefore holds equally 
if o-g or any other derivate be substituted for <r,. If this is so, 
then it must be admitted that:— 

The whole of the information reejieeting o-, which a eample 
prwndee, is summed up in the value of tr^- 

This unique superiority of is dependent on the form of the 
normal curve, 

fix, 

trsj 2ir 

which leads to a frequency density in generalised space dis¬ 
tributed on concentric spheres. Since it is sometimes urged in 



favour of the Mean Krror that it gives less weight to the large 
deviations, and that these large deviations do in fact occur in 
excess of the iionnal expectation, it is of interest to see if any 
curve is related to the Mean Krror in the same way as is the 
normal curve to the Mean Square Error. 

The somewhat artificial curve 


iToJ 2 

replaces the generalised spheres by generalised octahedra, upon 
the surfaces of which o-j is constant, provided For large 

values of n this condition is sufficiently approached and o-, may 
be taken as the ideal measure of or for curves of this type. When 
n is small, and allowance has to be made for the aberrations of 
dd, the figure on which cr, is constant is the central section of a 




2.770 

yyo Determining the Accuracy of an Observation. 

generalised octahedron, as was seen in the case n = 4, where 
the figure over which <r^ is constant was found to be bounded 
by six squares and eight equilateral triangles ; while the surface 
of equal probability is in general an eccentric section in which 
the squares become rectangles and the triangles are not all equal. 

When n is large, however, it does not seem unreasonable to 
employ or, to samples from curves which resemble the above 
rather than the normal curve. The value of (the ratio of the 
fourth moment to the square of the second moment) seems well 
fitted to provide a test. If this is near to 3 the Mean Square 
Krror will be retpiire*! ; if, on the other hand, it approaches 6, its 
value for the double exponential curve, it may be that a-, is a 
more suitable measure of dispersion. It should not be forgotten, 

however, that the factor in the formula for is derived 

from the normal curve of errors. The corresponding factor for 
the double exponential curve is x/about i 2 per cent, bigger. 



3.106a 


3 

STUDIES IN CROP VARIATION. I. AN EXAM¬ 
INATION OF THE YIELD OF DRESSED GRAIN 
FROM RROADBALK 


AUTHOR’S NOTK 

In the author’s early work at Rothamsted much attention was given 
to the massive records of weather, crop yields, crop analyses, etc., 
which had been accumulated during the long history of that research 
station. I’hc material was obviously of unique v'alue for such prob¬ 
lems as that of ascertaining to what extent meteorological readings 
were capable of supplying a prediction of the crop yields to follow. 

''Fhc present paper was the first of a series devoted to this end, and 
consists of an examination of the yields of selected plots in Broad- 
balk wheat field with a view to ascertaining the principal <*ompc>- 
nents of their variability as a preliminary to the study of meteoro¬ 
logical effects. The use of orthogonal polynomials is extensively de¬ 
veloped in this and the following papers as affording the means of 
isolating groups of components susceptible to separate explanations. 
An important feature of these wheat yields consists in slow changes 
too consistent to be ascribable to the immediate effects of annual 
fluctuations in the weather, and differing in kind from the progres¬ 
sive soil deterioration recognisable on plots with insufficient or un- 
balancied manuring. Kvidence is produced converging to the con¬ 
clusion that varying weed infestation representing a cumulativ^e in¬ 
tegration both of seasonal weather and of cultivation practice was a 
major factor in these slow changes. 


* Reprinted from Journal of Agricultural Science, Vol. XI, Pt. II, pp. 107— 
135, 1920. 



3.107 


STUDIES IN CROP VARIATION. 

I. AN EXAMINATION OF THE YIELD OF DRESSED GRAIN 
FROM BROADBALK. 

By R. a. fisher, M.A. 

(Statistical Laboratory, Rothamsted Experimental Stationy Harpenden.) 

(With Three Figures in Text.) 

I. The Variation in Wheat Yield. 

1. Introductory, 

The crop records available at Rothamsted extend back for over 70 
years. In Broadbalk wheat field 13 plots have been continuously under 
uniform treatment since 1852. The length of this unique series of obser* 
vations is of special value for statistical purposes. The average yields 
obtained from the Board of Agriculture’s Reports cover only half the 
period; even the Woburn records date only from 1877. The series of 
yields of dressed grain from the different plots of Broadbalk was 
examined from 1852, when uniform treatment commenced, until 1918, 
the last season available when the examination was begun. 

Several minor alterations have been made from time to time in the 
manurial treatment, even in the selected plots, but in the main it has 
remained uniform. 

(а) For the first 7 years plots 5, 6, 7, 8, 13, and 17 or 18, received 
300 lbs. per acre of sulphate of potash, of which for the subsequent 
60 years they received 200 lbs. per acre. At the same time the sulphate 
of soda was cut down from 200 to 100 lbs. per acre on plots 6, 6, 7, 8, 
16, and 17 or 18, and from 550 to 3661 lbs. per acre on plot 12; while the 
sulphate of magnesia on plot 14 was reduced from 420 to 280 lbs. per 
acre. 

(б) For two years, 1862 and 1863, all plots, except half of plot 3 
received 400 lbs. per acre of a mixture of silicates of soda and lime, and 
for 3 subsequent years, one half of plots 5, 6, 7, 8, 17 or 18 received 
228 lbs. per acre of this mixture. A comparison of the corresponding 



3.108 


108 Studies in Crop Variation 

sub-plots shows that this dressing had little effect, being, if anything, 
deleterious to the sub-plots receiving it. 

(c) For 12 years, 1868 to 1879, the chopped straw from each sub¬ 
plot was ploughed in, on half of plots 5, 6, 7, 8, 11, 12, 13, 14, and 17 
or 18. A comparison of the corresponding sub-plots shows that in this 
case also the effect upon the mean yield may be ignored in the analysis 
of the majority of the plots. Plots 11, 12, and *14 in which the deficiency 
of potash is a limiting factor showed a sensible benefit from the straw, 
and for these series (without straw) has been used from 1868. 

(d) For 5 years, 1898 to 1902, a dressing of 400 lbs. per acre of basic 
slag was substituted for the previous dressing of 392 lbs. of super¬ 
phosphate, on all plots receiving phosphate. No statistical evidence 
can be adduced as to the effect of this change, and it is assumed that the 
dressings were effectively equivalent. 

(e) For 1916 the supply of potassium sulphate was reduced to two- 
thirds of its previous dressing, the ordinary amount of potash being 
made up by the use of wood ashes. In 1917 and 1918 potassium sulphate 
was omitted altogether from all the plots, and owing to the insufficiency 
of wood-ash the deficiency was not made up. In order to continue the 
comparative treatment of plots 12, 13 and 14, the sodium sulphate on 
12 and the magnesium sulphate on 14 were omitted for the same year. 
From 1920 the original dressings have been resumed. 

(/) From time to time small areas have been incorporated in, or 
excluded from, the experimental plots. These changes can have had but 
little infiuence upon the yield per acre, which has always been reckoned 
on the actual area of each plot; the changes in size have not been large, 
and any new portion lias received similar treatment to that of the main 
plot, for some years before incorporation. 

Changes of Variety, 

The variety of wheat seed employed has not been kept the same. 
The following varieties are recorded : 

1852. Old Red Cluster. 

1853-81. Red Rostock, 29 years. 

1882-99. Red Club, 18 years. 

1900—1916. Squarehead’s Master, 13 years. 

1917 onwards. Red Standard. 

Between 1900 and 1912 the varieties were changed more frequently. 
Squarehead’s Master has been principally used, but intermitted with 
Giant Red (1905), Browick Red (1910), Little Joss (1911, 1912). Red 



3.109 


R. A. Fisher 109 

Standard was adopted in 1917 and will be used in future. When the 
varieties are changed infrequently, any effect due to genetic difference 
of constitution would be included in the slow changes. During the latter 
period it would appear partly as annual variation. That these genetic 
differences are not at any rate a principal cause of the slow changes 
observed, may be seen from the great changes in mean yields which 
occurred during the use of Red Rostock. 

2. The caiLses of variation in wheat yields. 

From the series of observations it is possible to distinguish three 
types of variation in the wheat yield: (1) annual variation, (2) steady^ 
diminution due to deterioration of the soil, (3) slow changes other than 
steady diminution. The annual variations may be ascribed primarily to 
the weather; including in that term not only the direct effects of meteoro¬ 
logical conditions in stimulating plant growth, but also the physical 
effects wrought upon the soil, such as the washing out of plant nutrients 
and the indirect effects of light, temperature and moisture in stimulating 
or retarding the increase of bacteria, protozoa and of the fungal and 
algal flora of the soil, all of which may be supposed to adjust their 
activities rapidly to the meteorological conditions. The steady diminution 
of yield may unhesitatingly be ascribed to deterioration of the soil; 
either, as in plot 10, to the exhaustion of natural supplies of potash and 
phosphorus, or, in other cases, perhaps, to that of unknown substances 
required in small quantities, and not supplied in the artificial manure, 
or to physical changes as yet but little understood, or, in plot 5, to the 
gradual exhaustion of the power of the soil of producing nitrates in the 
soil moisture. The third class of change is unexpected, and it is not easy 
to assign it entirely to any one cause. To establish the existence of large 
changes in the mean yield, to show how they may be disentangled from 
the other types of change, and to suggest their possible cause is the 
purpose of the present paper. 

3. Mean Yield and Annual Diminutiofv. 

In Table I is shown the mean yield and average annual diminution 
of mean yield, in bushels per acre, for the 13 plots here considered. 

^ The deterioration must not be assumed to be mathematically linear, although it is 
here represented by a linear function; on most plots it was probably more rapid in the 
earlier years than it was later, as is indicated by the parabolic term being on the whole 
more inclined to positive values on those plots in which the deterioration is more rapid. 
The true curves of deterioration cannot, however, be disentangled from the slow changes 
which have taken place owing to other causes. 



3.110 

110 Studies in Crop Variation 

Table I. 

Manure per acre 


Sulphate 

of 

Sulphate 

of 

Sulphate 

of 

Super- 

Sulf^hate 

Chh)ride 

of 

Mean 

Mean 

annual 

diminution 

potMh 

soda 

maaneaia phoephate 

ammonia 

ammonia 

(Bunhela 

(Buahels 

P. for 

Plot Ibt. 

lbs. 


Ibe. 

Ibe. 

Iba 

per acre) 

tier acre) 

deterioration 

24, dung 14 tons 

— 

— 

— 

_ 

_ 

34*649^ 

•031 

•41 

3 and 4, no manure 

— 

— 

— 


— 

12-260 

•097 

•000,001,4^ 

6 200 

100 

100 

392 

_ . 

— 

14-180 

•090 

•000,000,8 

« 200 

100 

100 

392 

100 

100 

22-581 

•141 

•000,11 

7 200 

100 

100 

392 

200 

200 

■ 31-367 

•144 

•002,1 

8 200 

100 

100 

392 

300 

300 

36694 

•092 

•056 

10 — 

— 

— 

— 

2(X) 

200 

19-504 

•157 

•000,25 

11 — 

— 


392 

200 

200 

22-046 

•219 

•000,003 

12 

366^ 

—. 

392 

200 

200 

28-319 

•181 

•000,35 

13 200 

— 

392 

200 

200 

30-209 

•123 

•009,1 

14 — 

— 

280 

392 

200 

200 

27-766 

•231 

•000,000,6 

17 1 alter- 200 

100 

100 

392 

— 

— 

14-610 

•092 

•002,8 

18/ nate — 

— 

— 

— 

200 

200 

29-006 

■114 

■005,6 


In the last column, P represents the probability of a larger annual 
diminution occurring by chance owing to the later seasons happening 
to be on the average less favourable than the earlier ones. In calculating 
P it has been assumed that there has been no real deterioration of the 
average weather, an assumption which is tested in Section 11. 

The values show that all but the dunged plot, 26, have suffered 
sensible deterioration. Even plot 8, which receives “ complete ” artificial 
manures, shows a deterioration which would not be expected more than 
once in eighteen random trials. It is, therefore, probably real, as are 
certainly all the others. 

4. Slow change,^ in mean yield. 

It becomes apparent on inspection of the actual yields that the 
changes in mean yield are by no means fully expressed as simple 
deterioration. The mean yield rises up to about 1860, and after a bad 
period in the seventies reaches a second maximum in the nineties. The 
probability of such large fluctuations occurring by chance may be 
calculated by the methods of Section 8, on the assumption, as above, 
that the incidence of good and bad seasons has not been orderly but 
fortuitous. 

When the variation of any quantity (variate) is produced by the 
action of two or more independent causes, it is known that the variance 
produced by all the causes simultaneously in operation is the sum of the 
values of the variance produced by each cause separately. The variance 
is defined as the mean square deviation of variate from its mean, and 
is therefore the square of its standard deviation. The above property of 



3.111 


R. A. Fishbr 111 

the variance, by which each independent cause makes its own contribu¬ 
tion to the total, enables us to analyse the total, and to assign, with more 
or less of accuracy, the several portions to their appropriate causes, or 
groups of causes. In Table II is shown the analysis of the total variance 
for each plot, divided according as it may be ascribed (i) to annual 
causes, (ii) to slow changes other than deterioration, (iii) to deterioration; 
the sixth column shows the probability of larger values for the variance 
due to slow changes occurring fortuitously. 

Table II. 

P. for 


Plot 

Annual 

causes 

Slow 

changes 

Deterioration 

Total 

slow 

changes 

26 

33-2 

17*6 

•4 

51 1 

-000,002 

3 and 4 

9-3 

2-6 

3-5 

164 

-004.3 

5 

11-7 

2-7 

30 

17-6 

•007.6 

6 

30-6 

80 

7-6 

46-1 

003.1 

7 

50*3 

13 3 

7-8 

71-4 

•001.8 

8 

53-2 

13-9 

32 

72 3 

•000,68 

10 

41-8 

3-7 

92 

64-7 

•26 

11 

60-2 

4-2 

18-0 

72*4 

•27 

12 

58*4 

7-8 

12-2 

78-4 

•086 

13 

50-7 

8-3 

6-7 

64-6 

•040 

14 

49*2 

6*3 

20-0 

76-4 

•10 

17) minerals 

21-4 

4-6 

3-1 

29-1 

•on 

18 [ ammonia 

38'2 

9-0 

4-9 

62-1 

•006.4 


The majority of the plots could be cited independently as evidence 
that the slow changes in the mean yield are not fortuitous. The less 
significant values from plots 10 to 14 are due not so much to the relative 
absence of slow changes, as to the great sensitiveness of these plots to 
annual causes of variation. 

The great variability of P is itself evidence that the slow change 
effect is of a different origin from that of the annual causes, for P is 
calculated from the ratio between these two portions. If, for instance, 
progressive alterations had been taking place in the average weather to 
an extent unmistakably distinct from independent annual variations 
occurring in random order, then we should expect all plots to show very 
small values of P, whereas, in fact, P is greatest on those plots most 
subject to meteorological disturbance; on the other hand, if, as appears 
to be the case, the succession of seasons is not distinguishable from a 
random order, then the very small values of P observed in some plots 
must have some explanation other than meteorological. 

The actual course and extent of the slow changes in mean yield, 
other than soil deterioration, is best shown by a smooth curve, a poly¬ 
nomial of the 5th degree, fitted to the series of yields of dressed grain 



3.112 


112 Studies in Crop Variation 

from plot 26 (Fig. la). This plot is not only sensibly free from deterioration 
but, as shown by the value of P, is least affected by accidental fluctuation 
due to good and bad seasons. In order to obtain data for the study of 
meteorological effects, it was necessary to eliminate as far as possible 



Fig. 1. Showing the course of the changes in mean yield in the continuously manured 
plots of Broadbaik. The vertical ordinate is plotted on a logarithmic scale in order to 
shew the proportionality of the slow changes in the several plots, and in order that the 
relative importance of soil deterioratioti may be compared. 




3.113 


R. A. Fisher 113 

variations of other than meteorological origin; for this purpose similar 
curves were fitted to all the plots, and the annual deviation of each 
plot, from the corresponding curves, will form the basis of a study of the 



effects of weather upon wheat yield. Fig. 1 shows that during two periods 
of high yields, centred about 1860 and 1897, the mean yield of plot 26 
was about 40 bushels per acre, while in the depression about 1876 it had 
fallen nearly to 30 bushels per acre, and in 1915 even lower. For special 




3.114 


114 Stmlies in Crop Variation 

reasons (Section 9) the extreme values are not so reliable as the others; 
yet there can be little doubt that the recent minimum is already past, 
and that a genuine improvement took place in the early years of the 
experiment. 




3.115 


R. A. Fishbr 


115 


5. Yield and exhaustion in relation to manurial treatment. 

Before passing to the theory of the significance of these polynomials 
attention may profitably be drawn to the effect of different manurial 
treatment upon the mean yields and average annual decrements in the 
different plots. The probable errors refer in each case to the deviations 
to be expected from variations of weather occurring in random order. 

Tables III, IV, V and VI give the mean yield with the absolute and 
relative rate of decrement arranged to show best the effect of differences 
of manurial treatment. 


Table III. 



Mean yield 
(Bushels 

Mean 

annual 

decrement 

Mean 

annual 


(Bushels 

decrement 

Plot 

per acre) 

per acre) 

% 

5, no ammonia 

U18i:-44 

•090 

•63 ±16 

6, single ammonia 

22-58^‘71 

•141 

•62 ±19 

7, double „ 

8, treble ,, 

31*37dt*90 

•144 

•46 ±15 

35-69 ±93 

•092 

•26 ±14 


Plots 6, 6, 7 and 8 all receive the same dressing of superphosphate 
and of sulphates of potash, soda and magnesia, they differ by successive 
units of ammonium sulphate*. The first increment is applied, half in 
autumn and half in spring, the subsequent increments in spring only. 
The increases in mean yield due to these increments of ammonium 
sulphate are 8*40, 8*77 and 4*34 bushels per acre, a series suggestive of 
“diminishing returns” although not conforming to the geometric series 
which has been proposed for such cases (Mitscherlich, 1909). The figures 
for the mean annual decrement, in the second column, show that although 
the mean of plot 6 exceeds that of plot 5 by 8*4, the latter plot has 
gained on 6 during the 67 years under discussion, at the rate of *051 
per annum. At this rate the difference would disappear in 165 years. 
Naturally no such result is anticipated from such a prolonged experi¬ 
ment; but this consideration serves to show that not only the mean 
values, but the rates of decrement observed, are comparable only when 
taken over the same period of years. 

It is to be expected that if the experiment were continued indefinitely, 
each plot would approach continually to a constant mean yield, but that 
these yields would differ materially for different manurial treatment. 
An approximation to the relations which finally exist between the mean 
yields of different plots may be obtained from the mean of the last nine 
years of the polynomial fitted to each plot. The values obtained are 



3.116 


116 Stwlies in Crop Variation 

10*77, 16*97, 23*67 and 30*16, for these four plots, and the differences 
produced by the three successive increments are 5-20, 7*60 and 6*69. 
In these figures there is no longer any evidence of “diminishing return.” 
The comparative yields of these four plots have become the standard 
example of diminishing returns in agriculture, and in the earlier years 
of the experiment fully bore out the anticipations of the economists. 
For the first 9 years of the polynomial, for example, the successive 
increments were 9*27, 8*16 and 1*43, and we have seen that a similar 
effect, though less marked, still appears in the means of 67 years. It is 
only in recent years that the progressive deterioration of the less highly 
manured plots has gone so far as to make the third increment of the 
series exceed the first, and so to make apparent the fact that the benefit 
of the higher dressings was not wholly reaped in the immediate yield, 
but to some extent is long effective in maintaining the fertility of the 
soil at a higher level. 

The average annual decrements when set out as percentages of the 
mean yield show a progressive advantage of the nitrogenous manuring. 
Since all the plots have received soda, potash, magnesia, sulphates and 
phosphates, equally while greater weights of these ingredients have been 
removed from the more highly manured plots, the advantage of the latter 
may be safely ascribed to the supply of nitrogenous plant nutrients. The 
increasing effectiveness of the nitrogenous manure might be ascribed to 
the gradual utilisation of slowly available nitrogen compounds in the soil; 
this view, however, is not borne out by the comparative nitrogen analysis 
of these plots, which are available since 1865, and it is more probable 
that the heavier vegetation supported by plot 8 has had some more 
indirect beneficial effect, equivalent to a more ample supply of nitrogen. 
The greater root growth on the heavy yielding plots has perhaps in¬ 
creased the effective depth of soil activity, or has supplied more abun¬ 
dantly substances required for bacterial life. If the natural supplies of 
nitrogen in plots 6 and 6 have become impoverished in these ways we 
should expect to find the additional dressings of sulphate of ammonia 
to become, as they have become, an increasingly important factor in 
maintaining yield. 

The four plots (3 and 4), 5, 10 and 7 illustrate the supplementary 
action of nitrogenous compounds on one side and mineral manures on 
the other. The mean yield of 7 not only exceeds that of 10 by a greater 
amount than does 5 exceed (3 and 4), but it exceeds it in a higher ratio. 
In its absolute rate of decrement 10 exceeds 7 and (3 and 4) exceeds 5, 
although in both cases giving a lower mean yield. This can only mean 



3.117 


R. A. Fisher 


117 


that a part of the decline of (3 and 4) and 10 is to be ascribed to the 
progressive exhaustion of some of the mineral ingredients supplied to 
6 and 7. 


Table IV. 




Mean 

Annual 



annual 

decrement 

Plot 

Mean yield 

decrement 

% 

3 and 4, no manure 

12*27 rfc *39 

•097 

•79drl6 

5, minerals 

1418i:*44 

•090 

•63 ±10 

10, ammonia 

7, ammonia and 

19*60 ±*83 

•167 

•81 ±*22 

minerals 

31*36 db *90 

•144 

•40±16 

17) minerals 

14*61 d:‘69 

•092 

•63 ±-21 

18) ammonia 

29*01 ±*79 

•114 

•39 ±14 


The comparison with the two series from plots 17 and 18 shows that 
the mineral series is very little better than 5, and much worse than 7, 
showing that there is little residual effect of the ammonium salts applied 
the previous year. That this small advantage is accompanied by greatly 
increased variability appears from Table II, in which the variance duo 
to annual causes in plot (17 and 18) minerals is nearly double that of 
plot 5; from the considerations of Section 14 this may be due to the 
greater effects in variations of weed prevalence. The relative annual 
decrement also agrees with 5, in being much greater than 7. On the 
other hand, the ammonia series of (17 and 18) has a mean yield much 
more close to 7 than to 10, showing that there is relatively little ultimate 
loss of the mineral ingredients applied in previous years. The relative 
decrement of (17 and 18) ammonia is even less than that of 7, owing 
presumably to the smaller crops, averaging only about 21*8 bushels per 
acre, which have been taken off these plots. The difference in annual 
decrement, *07 per cent., is very small compared with the difference in 
mean yield, of 9*6 bushels per acre; and this suggests that, in this case 
also, if residual minerals could be tested year after year against minerals 
freshly applied, the higher yielding series would display the less 
deterioration. 


Table V. 



Mean yield 

Mean 

Annual 


in bushels 

annual 

decrement 

Plot 

per acre 

decrement 

O/ 

/O 

11, no K, Na or Mg 

12, sulphate of soda 

22 06 ±-91 

•219 

•99 ±-21 

28-32 ±-98 

•181 

•64±18 

13, sulphate of potash ... 

30-21 ±-91 

•123 

•41 ±16 

14, sulphate of magnesia 

27-76 ±-90 

-231 

•83 ±17 

7, All three sulphates ... 

31-37±«90 

•144 

•46 ±16 


In comparing plots 7 and 13 it is surprising that the latter, which 
has slightly the lower mean yield, should have the lower relative rate 



3.118 


118 Studies in Crop Variation 

of decrement. This fact by itself suggests that the addition of sulphates 
of soda and magnesia to the sulphate of potash, stimulates production, 
perhaps by making more potash available, but causes more rapid ex¬ 
haustion. On the other hand, if we contrast either 12 or 14 with 11, the 
actual production is much increased by the addition of sulphate of soda 
or magnesia and the relative annual decrement is reduced. 

In these figures it is not easy to distinguish any physical effect of 
the large quantities of saline matter annually added to the plots. In 
comparing plot 7 with 13 the former receives a large quantity of sulphates 
of soda and magnesia, which, in view of the dressings of sulphate of 
potash which both receive, might be expected to add little to the avail¬ 
able plant nutrients while adding much to the salinity. That the average 
yield of plot 7 exceeds that of plot 13, suggests that even in this case the 
nutrient advantage outweighs the effect of additional salinity, and it is 
not clear that the greater deterioration of plot 7 is to be ascribed to the 
latter cause, since this is also characteristic of plots 12 and 14 in which 
it must be largely ascribed to the progressive exhaustion of soil potash, 
wiiich the sulphates of soda and magnesia have naturally facilitated. 



Table VI. 

Mean 3 'ield Mean 

Moan 

annual 


in buubels 

annual 

decrement 

Plot 

per acre 

decrement 

‘V 

/o 

26 

34-50+ -74 

•031 

•09i--ll 

8 

,35 69 ± -93 

092 

•26-h’14 


The dunged plot 26 can only be well compared with 8. Alone of the 
plots it shows no significant diminution of yield; plot 8 comes third in 
this respect, the probability of an increment or decrement as large or 
larger being the result of a random distribution of favourable and 
unfavourable years being *056 (or 17 to 1) for 8, against *41 (nearly even 
odds) for 26. On the average of 67 years 8 gives the higher mean, but 
the difference, 1-14, is less than 26 gains on 8 during 20 years, so that 
during the present century 26 has had the highest mean yield. In ad¬ 
dition to having the best sustained, and now the highest yield, 26 is the 
least variable; the standard error of the annual decrement per cent, in 
the other plots ranges from *14 to *22, in 26 it is *11. The crop on the 
dunged plot is therefore better placed to do moderately in a bad year 
of drought or excessive rain, than to take great advantage of a favourable 


season. 



3.119 


R A. Fishbb 


119 


II. Thsoby of polynomial fitting. 


6. Uncorrdated terms T,. 

If a quantity x have values x^^ ... x^ at a number n of successive 
times, the general course of its changes may be represented by a poly¬ 
nomial, a 4- 4- + ... + Jkr, 

in which t represents the time. In this expression the coefficients 

tty h, c, ... ky will be altered if more or fewer terms of the series be used; 

for instance, the value of a will be different according as one takes 2 or 

3 terms, for unless c is zero the mean value of ct^ will not be zero. If 
the coefficients obtained are to be independent of the number of terms 
employed, the successive terms must consist of polynomials of degree 
0, 1, 2, ... r, which are mutually uncorrelated. Such uncorrelated 
polynomials may be obtained uniquely in succession, for the term of 
degree r must fulfil r conditions in order to be uncorrelated with the 
preceding terms; these conditions specify the term completely with the 
exception of a numerical factor which is absorbed in the coefficient. 

When the values, as do annual numbers, stand at equal intervals of 
time, t may be conveniently meastired from the mid-point of the series, 
in units equal to the time interval; the series is then 


A 4“ St 4" O {t^ — w-ji) 4' 


z, (e* - t) 4-«- ’***[ 
\ ng / { n^— n,* — nj* ) 


+ F\t^- 




as far as ng, n^, and represent the mean value of i®, f*, and <*, 

and for a series of n terms, 

^4/^2 — A ( 3 ^* 7 ), 

(n^ ^2^4)/(*^4 ^2^) ~ — 13), 

(ngne ~ n4*)/(n4 - n^*) =.’ (n* - 1) (n* - 9), 

(ngng - n^ne)/(Wg»ie - V) = ts (»»* “ 7), 

^s*)/(^ 2^6 ^ 4 *) ~ T^sVs — 230n* 4 - 407). 

Variance accounted for by each term. 

The coefficients are chosen so as to make the residual variance a 
minimum; consequently the residual variance is reduced at each stage 
by an amount equal to the variance of the term itself. If T, stand for 



3.120 


120 Slitdies in Crop Variation 

the term of degree r, the mean value of T^Tq is necessarily zero, when 
p and q are different, while the mean value of is 

(k)* 

(n2 - P) (n^ - 2*) ... (n« ~ r*). 

n*- 1 
12 ’ 

(n* - 1) (n* - 4) 

180 

(n2 - 1) (n2 - 4) (n2 - 9) 

2800 

(n^ - 1) (n* - 4) (n2 - 9) (n* - 16) 

44,100 

(n2 - 1) (n2 - 4) - 9) {n^ - 16) (n* 25) 

698,544 

The variance contributed by each term, and by which the residual 
variance is reduced when that term is removed, is therefore of the form 
712^2, and so on. 

7. Distribution of variance for unchanging series of independent values. 

A series may be said to change when the chance of an observation 
falling in any given range is a function of the time. When this is not the 
case the series is unchanging. The main difficulty in the adecjuate treat¬ 
ment of annual returns of economic and vital statistics lies in the pre¬ 
valence of profound changes in the population observed. It is probable 
that the disturbing factors could bo largely eliminated by the dis¬ 
criminating use of polynomials as here described. 

If is the coefficient of the term of the pth degree, and x is any 
observed value, then Aj, ^ TpXlTf^; hence the variance of this term 

if the values of x are independent, each with standard deviation, o-, and 
the series unchanging _ « 

J 2 ^ 2 _ _ 

Hence, if polynomials are fitted to unchanging series of n cpiantities, 
the variance contributed by each term is on the average 1 jn of the total 
variance. After fitting n terms the fit is necessarily perfect, and the 
residual variance zero. The first step merely consists in ascertaining the 
mean, as is ordinarily done before finding the variance, the subsequent 


Thus Ti* - 

7^32=. 

IV - 







3.121 


R. A. Fishkh 121 

curve fitting, applied to an unchanging series divides the variance up into 
(n — 1) parts, each of which has on the average the same value. 

The actual frequency distribution of these fractions may be most 
simply shown by putting for the variance of the term of degree p\ 
€j, being vsupposed H or —according to the sign of Aj,. Thus 

== A, 

If X is distributed normally about its mean value, so then is Cj, with 
a standard deviation, which we have already evaluated at alVn. In 
unchanging series every value of except the first, is distributed about 
zero as its mean; €q - x, the mean of the series. 

In a changing series in which the change consists mainly in an 
alteration of the mean value, wo may represent the change in mean 
value by a sufficient number of terms of a polynomial, the deviations 
from the mean being then an approximately unchanging series. The 
mean value of Cp is then the value corresponding to the series of means; 
its standard deviation is unaltered, provided that a is interpreted as 
the standard deviation of any observation from the changing mean. 

8. 2'he significance of an observed term. 

If the change in the mean value is sufficiently represented by the 
terms up to then the mean value of the residual variance is 

n — r — I 

or**. 

n 

If the residual variance is we may therefore take 

< 7 ^ (7 2 

V - 

n a — r — 1 

as the variance of each e obtained. It may be that is slightly increased 
if the terms up to T,. do not fully represent the course of the mean, in 
this case the significance of the preceding terms may be somewhat 
underestimated. There is no tendency to overestimate their significance. 

If V is the mean variance contributed by each term, the mean 
value of the total variance contributed by p terms is p\\ If t is 
the variance from an actual sample of p terms, the distribution of 

p-2 _ I 

t is easily seen to be dfozt^ e a curve of type III. The standard 

deviation of f is therefore 

(Tt = vV2p, 



3.122 


122 Studies in Crop Variation 

The standard error in the determination of the annual variance is 

therefore represented by a coefficient of variation equal to 100 

In this expression p will be n — r — 1. In the application to the Broad- 
balk wheat yield w = 67, r = 5, so the standard percentage error 

100\/^= 1811. 

The combined significance of a group of terms may be derived from 
Elderton’s tables of goodness of fit, taking for fhe ratio of the observed 
variance to v, and for n one or more than the number of terms. 



9. Magnitude of the Residuals. 

Tiie coefficients of the fitting polynomial are given by the equations 
A 

where S represents summation over the 7i observations. The polynomial 
is therefore 


and the residual is 


r s (T^x) 
0 


r T 2 ') { ^ T T 'x') 

y J p I _ \y ^ p •*' f 

‘'to-Sir/)!’ 


in wliich S' represents summation over all observations except x. 

If X and x' are independently distributed about a mean at zero then 
for a fixed value of t the mean square residual is 


t oS{T/ 


Now To 1, S{Tq^) n, 

T, t, 1), 

1\ ^ , S (= -r«(n" 1) {n^ 

and so on. 

The extreme value for / is (n l)/2; for this value, 

To" _ i 
N(To") n’ 

_3(w-l) 

S{T^^) ~ w(nq- 1)’ 

Ta" _5(n~I)(n-2) 
S{T^^) n(n -h 1) (nd- 2)’ 


and so on. 



3.123 


R. A. Fisher 123 

The average value of the mean square residual for all values of t is 

as is necessarily the case since each term used removes a*/n of the vari¬ 
ance; the reduction of variance is not the same for all values of and 
this introduces an element of heterogeneity. The reduction is greatest at 
the extremes. The polynomials tend to fit the extreme terms more 
accurately than the others. This effect is only strongly felt in the first 
and last terms. Fitting 67 terms with a curve of the 5th degree the 
variance of the first and last residual is reduced to 63*87 per cent, of 
its average value. 

The residual variance for each value of ^ of a series of 67 terms, fitted 
to the 5th degree, is shown in the following table; the original variance 
being taken as 66, and the mean residue therefore 61. 


Table VII. 


±t 


±t 


0 

63-48 

17 

62-96 

1 

63-46 

1-8 

62-82 

2 

63-40 

19 

62-69 

3 

63-29 

20 

62-27 

4 

63-18 

21 

61*90 

6 

63-04 

22 

61*60 

6 

6291 

23 

61-12 

7 

62-80 

24 

60*83 

8 

62-72 

25 

60-67 

9 

62-67 

26 

60-66 

10 

62-67 

27 

60-76 

11 

62-71 

28 

60-81 

12 

62-78 

29 

60-68 

13 

62-87 

30 

69-33 

14 

62-96 

31 

66-36 

16 

6301 

32 

60-23 

16 

63 02 

33 

38-96 


These are shown in Fig. 2. 

The introduction of slight heterogeneity is a necessary consequence 
of the elimination of change by curve fitting; it is a weakness of the 
polynomial form that the extreme terms should be so much affected. 
The form of fitting polynomial is therefore unduly affected at its extremes 
by fortuitous circumstances; its values at the extremes give a less reliable 
index of mean yield than in the remainder of the range; still less is the 
rate of change at the extremes to be relied upon as a basis for prediction. 

It should be noticed that for each year of the series separately the 
polynomial value is an approximation to the true mean for that year. 
The variance of the polynomial value about the true mean is com¬ 
plementary to that of the observation about the polynomial value (the 



3.124 


124 Studies in Crop Variation 

residual variance). Together they make up o*, the variance of the obser¬ 
vation about its true mean. Thus the residual variance of the end values 
being only 38-96 parts in 67, the variance of the polynomial value must 
be 28-04; this is four times the average value so that the probable error 
of the extreme values of the polynomial is double as great as the average 
value for the whole series. 



Fig. 2. Residual variance of individual terms, according to their position in the series. 

10. CorrelcUion of residuals. 

The formulae of the last section show that once the changes of the 
mean have been sufficiently represented, the addition of further terms 
to the polynomial is disadvantageous (i) in increasing the probable error 
of all values, (ii) in increasing the heterogeneity of the residuals. On 
the other hand, if an insufficient number of terms are taken (i) the 
residual variance will be overestimated, (ii) the residuals will be hetero¬ 
geneous by the confusion of changes with annual variation. 

It is possible to test whether the process of fitting has been 
carried far enough by means of the correlations between neighbouring 




3.125 


R. A. Fisher 125 

values. The correlation between any two deviations from the mean 
of n, is, if they are independent, 

_ 1 

n-V 

while the correlation between neighbouring values of a changing series 
will generally be positive. For example, in the yields of dressed grain 
from the dunged plot, 26, of Broadbalk, neighbouring values are evi¬ 
dently associated. The correlations between harvests 1, 2, 3, up to 6 
years apart are: 


r 

z 

P.E. 

1 + -3669 

+ -3722 

± 0860 

2 +-1637 

+ 1652 

±•0867 

3 + -2767 

+ ‘2841 

± 0864 

4 +1811 

+ 1831 

± 0871 

6 + -2030 

+ -2058 

± 0878 

6 +1488 

+ 1499 

± 0886 

Mean 

.. +‘2429 

±0354 

Expectation 

.. -0162 


Difference 

.. +-2581 

± 0364 


(In calculating averages or differences of correlations, it is advisable to use, as abnve 
the function z, connected to r by the equation 

r = tanh z; 

when r is small, as in these cases, z is little different from r; for large values, however, 
the inaccuracies involved in using the probable error of r as though errors in r were 
normally distributed, become considerable.) 

Evidently neighbouring values tend to be alike. This correlation 
between neighbouring values gradually disappears as the changes in 
mean value are more and more closely represented by the polynomial, 
at the same time the negative correlation to be expected from an un¬ 
changing series is gradually increased. When these two values are no 
longer significantly different, the subsequent terms of the polynomial 
cannot be of importance. 

The calculation oJ[ the correlation to be expected between residuals 
of k years apart, for an unchanging series or one in which the change 
is fully represented by the polynomial, is somewhat complex. When k is 
small compared to n it will be sufficient, as in the present applications 
of the theory, to ignore terms involving 

* 

n — r — 1 

to the third and higher powers. In these cases the correlation between 
residuals of the polynomial of degree r taken k years apart is 

_r + 1 \ 

n—r— 1\ n—r— 1 (w—r— 1 )**“/ 



3.126 


126 Studies in Crop Variation 

Thus, after littin" the polynomial for 67 values up to the fifth degree, 
the mean correlations between residuals are: 


k r 

2 

P.E. 

1 - 0902 

- -0904 

rh 0850 

2 -0817 

- 0819 

± 0857 

3 - 07.30 . 

- *0731 

± 0864 

4 - 0040 

- 0641 

± 0871 

r» - -0547 

- 0547 

± 0878 

0 - •04.'>2 

- 0452 

± 0886 

Mean 

- 0687 

f 0354 

The figures for plot 26 are even 

more 

negative than the expectation. 

1 + 0034 

2 

+ 0034 


2 -1682 

-1698 


3 - 0624 

- 0625 


4 - •1490 

-1501 


5 - ■0r)83 

- 0584 


6 - 0407 

- 0407 


Mean 

- 0797 


Expectation 

- 0687 

J. 0354 

Diff<jrcnco 

-0110 

±0^4 

Neighbouring residuals of this 

series ; 

are not less unlike than w^ould 

expected from an unchanging series; it 

may be inferred that the poly- 


nomial of the fifth degree sufficiently represents the course of the slow 
changes. 

III. Possible cause of slow changes. 

11. The local character of the slow changes. 

The distinction between the slow clianges in mean yield and the 
annual variations is emphasised by the extremely local character of the 
former. An examination of the siiccessiv^e yields (a) of the experimental 
wheat at Woburn, (6) of the wheat averages for Hertfordshire compiled 
by the Board of Agriculture, (c) of barley, and {d) of grass from the 
experimental plots at Rothamsted, shows that the slow changes of mean 
yield observed on Broadbalk arc not r€*flected in these series. To make 
the test objective correlation coefficients were found between the yields 
on plot 3a of Stackyard Field, Woburn, and (1) the actual yields of plot 
26 of Broadbalk, (2) the polynomial values and (3) the deviations from 
the polynomial for the same plot. 

Table VIII. 

(’orrelationH cif plot 3a of Stackyard 
Field, Woburn, with 

r z 

Actual yield ... +-31 +-.32j_*12 

Polynomial value + -02 -j -02 ;£ • 12 

Deviations ... +-34 +-35±-12 





3.127 


R. A. Fishbr 127 

There is no significant association of the polynomial value with the 
yields on Stackyard Field, Woburn. The latter values are indeed more 
closely associated with the deviations from the polynomial than with the 
actual yields. The correlations are all somewhat low, perhaps owing to 
the very different soil at Woburn. The probable errors refer to deviation 
in the value of the correlation to be expected in a random sample of 
36 years, the three values above are, however, taken for identical years, 
so that the probable error gives no indication of the significance of their 
differences. 

Table IX. 

Correlation of Hortfordahire mean 
wheat yield with 


r z 

Actual yield ... +-31 + *32^1:'13 

Polynomial value -*16 -•16i*13 

Deviation ... +-66 +*61d:*13 

As before, the correlation with the polynomial value is not significant. 
Its actual value is negative, and the effect of including the polynomial 
terms in the actual yield, is to reduce the strong positive correlation 
shown by the deviations to a value which by itself would not be clear 
proof of any association whatever. 

The comparison with the dunged barley plot on Hoos Field is 
interesting, since it might be suspected that the slow changes in mean 
yield, at any rate on the dunged plots, were partially caused by changes 
in the manurial value of the farmyard manure employed. That this effect 
is not of great importance may be seen by comparing the polynomials 
from the dunged plot with that of plots receiving only artificial manures 
(Fig. 3). This conclusion is confirmed by the comparison with Hoos 
barley. 

Table X. 

Correlation of plot 7 (2) of Hoos 
Barley Fiold with 

r z 

Actual yield ... + -28 + *29 ± -09 

Polynomial value -4--09 -»--09i:-09 

Deviation ... +*23 +*23 ±-09 

All the correlations are small, and that with the polynomial is 
insignificant. The inclusion of these latter has in this case slightly 
increased the correlation observed from the actual yield. 

With the yield of the grass plots the correlations are negative, but 
are all insignificant. 



3.128 


128 Stvdies in Crop Variatimi 

Table XI. 

Correlation of plot 9 (firat crop) of 
Park Qraas with 

r——— ^ --— N 

r 2 

Actual yield ... -*19 . , -•20i:-12 

Polynomial value --12 -■12J-*12 

Reeiduals ... -12 -*12^ 12 

Thus the correlations of the residual yields with neighbouring? wheat 
crops is clearly significant and greater than that of the actual yields, 
with barley and hay the correlations are less distinct; in no case do the 
polynomial values give significant correlations. The causes of the changes 
in mean wheat yield on Broadbalk do not affect the wheat yields of 
Woburn or Hertfordshire, or the barley and hay of neighbouring fields 
at Rothamsted; they appear to be limited to Broadbalk field. 

The relation between yield and weather cannot be fully treated here; 
indeed, the present results may be regarded as preliminary to a thorough 
attack upon that problem. It is here only necessary to justify the 
assumption that the weather of a succession of years may be treated 
as an unchanging series (see Section 7). A full investigation of this 
point cannot be attempted until the whole meteorological data from the 
Rothamsted Station has been analysed. As an example of the method 
of investigation appropriate for testing this question in its present 
application we may cite the case of October rain. 

Hooker (1907) found that the rainfall in autumn was more infiuential 
than that at other times of the year. The series of October rains was 
therefore examined to see if any change in mean rainfall could be de¬ 
tected. To do this accurately the series was fitted with a polynomial of 
the fifth degree, exactly as had been done to the wheat yields, and the 
chance of obtaining larger coefficients from an unchanging series was 
calculated. Every term individually proved to be insignificant, the 
chance of finding greater variance in the first 5 terms of an unchanging 
series being -88 (compare Table II). There is thus no evidence of more 
than random changes in the October rain. 

One significant, though small, correlation was found between the 
polynomial values and the weather. On examining the correlation of 
total rainfall for each harvest year (Sept, to Aug.) with the succeeding 
crop, the following values were found. 

Table XII. 

CJorrelation of total rainfall with 

_ _A_ 

r 2 

Actual yield ... —*<10 —-09.4: *09 

Polynomial value -*29 -*29 ±-09 

Deviations ... -*66 -*64 ±*09 



3.129 


R. A. Fisher 129 

This would indicate that a perceptible proportion, nearly 9 per cent, 
of the variance of the vnean yield is due to difference in the year’s rain¬ 
fall. A further point of interest in connection with this result is that the 
optimum rainfall derived from the deviations is about 21 inches, that 
from the actual yields 17 inches, while for the polynomial value there is 
no optimum at all. This suggests that ideal conditions for the wheat 
plant on this soil requires a rainfall about 21^ inches, but that lower 
values though somewhat injurious to the plant, are of permanent value 
to the field by facilitating the eradication of weeds. 

If this be so, it illustrates a point to be borne in mind in considering 
the effect of weather upon crops, that the ideal weather for the plant is 
not necessarily ideal for the purpose of farm operations; ifc is not to be 
expected that in every case these two classes will be distinguishable as 
above, the one as an annual effect, the other with relatively permanent 
consequences. 


12. Possible influence of weeds. 

Of all the organic factors which influence the yield of wheat, it is 
probable that weeds alone change sufficiently slowly to explain the 
changes observed on Broadbalk. Farm weeds are notoriously difficult 
to eradicate; deep-rooted perennials throw up stems year after year 
from below the cultivated layer, whilst annuals, and on Broadbalk 
especially the slender Foxtail grass, produce numerous seeds which may 
lie dormant for several years, so baffling the most thorough attempts to 
free the field once it is badly infested. The conditions of cxiltivation of 
Broadbalk make the control of weeds peculiarly difficult. The wheat, 
grown year after year, has been sown in autumn in every year save one 
(1863) leaving but a short space of time for cleaning the land. The mean 
date of autumn sowing is Nov. 1«4: and that of carting the crop is Aug. 
24*9, leaving a mean interval of only 68*5 days. Compared to fields 
bearing spring-sown corn, or periodical root crops the position of Broad¬ 
balk is exceedingly unfavourable. It is therefore not improbable that if 
for a period the weeds are unchecked either from a relaxation of effort, 
or from unfavourable seasons, a considerable number of successive wheat 
yields would be reduced, giving rise to such periods of depressed yield 
as have been observed in the early fifties, and the seventies of the last 
century and about 1914. 

^ Thia refers to rainfall as actually dietribuled throughout the year. To discover the 
optimum distribution of rainfall will require a far more elaborate analysis. 



3.130 


130 Studies in Crop Variation 

13. Records of Weeds. 

(1) An old record exists giving the exact work of every man employed 
in each day. The record includes parts of the four years 1852 to 1865, 
and is complete for the year 1853. During this period the wheat yields 
show that the condition of the field was improving even on the unmanured 
plots. The amount of hand labour employed on Broadbalk is striking; 
in 1853 211*5 man-days and 714 boy-days were expended in weeding 
the field. The whole period of weeding operations little exceeds 100 
working days, so that roughly the work done was equivalent to the 
continuous labour of two men and seven boys, on a field of about 14 
acres. The proportion of boy labour is striking and the record is suffi¬ 
ciently detailed to show how it was utilised. 

The weeding done by the boys was principally by hand, with some 
spudding in May or June. Only men used hoes. In April boys were at 
work '■‘picking twitch,” in June they were “pulling garlic” (presumably 
charlock) “and the larger weeds,” and in July they were “pulling wild 
oats ” in the high corn. 

If our interpretation of the change in mean yield is correct, this free 
use of boy labour was eminently successful, for the mean yield mounts 
rapidly to its maximum about 1860. Unfortunately, no comparable 
record of the employment of labour exists at other dates. 

(2) In the Broadbalk records the first botanical account of weeds is 
in 1867, a date corresponding to the beginning of the second depression 
in the mean yields (see Fig. 1). From then to 1889 the field is frequently 
described as “exceedingly foul”; from 1889 the weeds are not mentioned 
till 1904, when the weeds were sufficiently dominant to require that 
the field should be fallowed, as was done by halves in 1904- 5, and again in 
1914 15. The evidence of the records thus confirms the supposition that 
the field suffered from weeds at periods about 1877, and 1910, and was 
relatively free in the high-yielding period in the nineties. 

From the observations of the seventies it is clear that the dominant 
weeds consisted of five perennials, Sonchus arvensis (Corn Sowthistle), 
Convolvulus arvensis (Bindweed), Rquisetum arvense (Horsetail), Cirsiutn 
arvense (Creeping Thistle), and Agrostis vulgaris (Twitch grass), and 
three annuals, Polygonum aviculare (Knotgrass), Myosotis^ arvensis 

1 In the opinion of Dr Brenchley, whom I have had the privilege of consulting, it is 
to be doubted that this weed, even when conspicuous, was really present in sufficient 
quantity sensibly to depress the 3 neld. 



3.131 


R. A. Fisher 131 

(Birdseye), and Stellaria^ media (Chiokweed). A fourth annual, the 
Slender Fox-Tail grass, Alopecurus agrestia, was certainly regarded as 
an unimportant weed in 1867 and 1869, and is not mentioned in 1872, 
1873 and 1876, but in 1879 and the eighties it has become enormously 
abundant, and at the present time it is considered to be by far the most 
troublesome weed. In 1886 this weed had become such a pest that 
“Sir John decided that pulling up by hand should be resorted to.” In 
connection with this, it may be remembered that the Education Acts 
of 1876 and 1880 made attendance at school compulsory; the boy labour 
which had regularly hand-weeded the land in the past had evidently 
been cut off for some time before 1886. Great efforts were made again 
in 1887 to eradicate the weed by hand weeding, but the wet summers 
of 1888 and 1889 prevented this operation and the land again became 
very foul, and was partially fallowed in 1890 and 1891 by drilling the 
rows at double widths over half the field. After this the weeds seem 
to have been kept in check until Sir John Lawes’ death in 1901^. In 
1904 Alopecurus agrestis was so thick that the field was given a complete 
fallow in two halves in 1904, 1906. 

14. Indirect evidence of the influence of weeds. 

Much evidence has already been adduced that the slow changes in 
the mean yield is due to very different causes from those that produce 
the annual deviations; in particular, the local character of the former has 
been mentioned. The variation due to those two causes, relative to the 
mean of each plot, may be conveniently expressed as a coefficient of 
variation. 

The most striking point about these figures is the comparative con¬ 
stancy of the coefficient of variation for slow changes, especially for 
neighbouring plots, if we exclude plots (17 and 18) for separate dis¬ 
cussion. The variation is uniformly greater in the north half (plots 26 
to 8) of the field than in the more southern portions (plots 10 to 14), the 
localisation of the effect thus showing itself even within a single field. 
The coefficients of variation from annual causes are quite different; they 
show no influence of locality, the most variable being those plots in 
which exhaustion has been most rapidly in progress. The proportionality 

^ Possibly Arenaria aerpiUifolia (Sandwort) was sometimes recorded as SUMaria media; 
the latter weed is, however, the more frequent in recent reports. 

* During the nineties parties of schoolgirls were employed at hand-picking, at Easter 
and Whitsun, on Saturdays and in the evenings. Sir John Lawes took much interest in 
their work, giving prizes to those who collected the greatest quantity. 



3.132 


1*32 Stvdiea in Crop Variation 

of the slow changes is well shown by plotting the polynomials upon a 
logarithmic scale, as in Fig. 1, which shows the changes in mean yield 
on plots 2h to 14. 


Table XIII. 


Coefficient of variation due to alow 
changes and annual doviationa 



Slow 

Annual 

Plot 

changes 

deviation 

26 

1213 

16*68 

3 and 4 

12-86 

24*90 

5 

11*56 

24*17 

6 

12*53 

24*51 

7 

11*61 

22*61 

8 

11*16 

20*44 

10 

9*83 

33*32 

11 

9*33 

32 13 

12 

9*88 

26*98 

13 

9*54 

23*56 

14 

9*00 

25*26 

17) minerals 

14*72 

31*88 

18 j ammonia 

10*35 

21*32 


This proportionality, while discriminating the slow changes sharply 
from the meteorological effects, is not an unexpected feature if these 
changes are due to the varying prevalence of weeds. For in a bad period 
perhaps 25 per cent, of the area of each plot is unproductive, while 
after many years of constant attention the percentage lost may be 
reduced to about 5 per cent.; the constancy of the coefficient of variation 
in the different plots indicates that the area lost in the same year is 
roughly proportional in all plots. 

An exceptional amount of variation is shown by the mineral series 
of plots 1/ and 18; this plot differs from the others in that while the 
growth of wheat is limited by shortage of nitrogen, in the previous year 
it received a nitrogenous dressing. This should be to the advantage of 
the perennial weeds which should have benefited by the previous year’s 
manure. The mineral plot should therefore suffer particularly when the 
field is infested by perennial weeds, while the corresponding ammonia 
plot should show principally the effect of annual weeds. 

A comparison of the polynomial for plot 17 and 18 minerals with 
that for plot 5 which it nearly resembles, shows that (Fig. 3) the former 
was in fact more seriously depressed in the first two periods of low 
yields, to such an extent that it has a lower average in the early 
fifties and about 1880, while in the recent depression the effect is less 
marked, its minimum being higher than that of plot 5. On the other 
hand, comparing plot 17 and 38 ammonia with plot 7, which receives 



3.133 


R. A. Fisher 133 


ammonia continuously, the first two depressions of the former are con¬ 
siderably less, while the recent depression is quite as great as that of 
the latter. These differences are comprehensible if in the two earlier 



Fig. 3. Yield of alternating plots 17 and 18, compared with plots 5 and 7; showing the 
exaggeration of the two earlier depressions, where a nitrogenous dressing has been 
applied in the previous year. 


depressions perennial weeds were more influential than has been the 
case in recent years, while the recent depression is due in larger measure 
to annual weeds. Similar conclusions have been seen to be indicated 
by the references to weeds in the records. 




134 


Studies in Crop Variation 


3.134 


Summary. 

In Part I is given a survey of the results of a statistical examination 
of the yield of the plots of Proadbalk Wheat field during 67 years. The 
main features of the comparison of mean yields are well known; the 
comparative rates of decrement, shown in Section 5, supply a class of 
facts well worthy of further study. Particularly striking are the relatively 
slow rates of decrement of plots 26 and 8, compared with plot 7, which 
would seem to show a permanent advantage in very high nitrogenous 
dressings, and to emphasise the need for caution in the application of 
the principle of diminishing ]?eturns. The evidence of the influence of 
potassium sulphate and its substitutes, sodium sulphate and magnesium 
sulphate, shown in Table V, is also very striking. An unsuspected foatiire 
of the changes of mean yields, which precludes the possibility of obtaining 
from these data true curves of exhaustion has appeared in the slow 
changes which have taken place in all the plots in a similar manner. 
Tn Part II the mathematical methods by which the variation has been 
analysed has been discussed, partly as a justification of novel procedure, 
partly to make clear that the three types of variations found have been 
genuinely distinguished. In Part III such evidence as is available has 
been presented, in order to throw light upon the possibility that the 
changes in mean yield have been caused by variations in the prevalence 
of weeds at different periods. 

One point of importance which should be emphasised is that average 
wheat yields, even over long periods, from different fields or for different 
seasons cannot approach in accuracy the comparison of plots of the 
same field in the same seasons. The advantage of the method adopted 
by Lawes in the permanent experiments which he instituted is very 
evident. The effects of weather clearly require that the seasons should 
be identical, unless the series be very long, but the slow changes in 
mean yield show that even comparatively long series of different years 
from the same field cannot be accurately compared. Within the same 
field, however, the slow changes have almost proportional effects, and 
comparison between the mean yields of neighbouring plots may be made 
with great accuracy. The only case in which changes in mean yield 
sensibly affect the comparison of averages is that of plots 17 and 18. 
In comparing these with plots 3 and 4, 5, 7, and 10, it would be more 
accurate to confine attention to high yielding periods, at which the 
disturbing causes are at their minimum. 



3.135 


R. A. PiSHKR 135 

It is believed that the deviations from the smooth curves, which 
have been freed*, for the most part, from the effects of exhaustion and 
weeds, form statistically homogeneous material for the study of meteoro¬ 
logical effects. 


REFERENCES 

R. H. Hooker (1907). “ Correlation of the weather and crops,** Journal of the Royal 
Statistical Societyt 70, pp. 1-42. 

E. A. Mitoheruch (1909). **Das Gesoiz des Minimums und das Gesetz des abneh- 
menden Bodenertrages,” Landw. Jahrb. xxxvm. 637. 

K. Pearson (1914). Tables for Staiisticians and Biometricidns, Camb. Univ. Press. 



4.324a 


4 

I'HE ACCURACY OF THE PLATING METHOD 
OF ESTIMATING THE DENSITY OF 
BACTERIAL POPULATIONS == 


AUTHOR’S NOTE 

Starting with a purely empirical examination of the precision ob¬ 
tained in estimates of the numbers of soil bacteria, the authors were 
led to examine the properties of the small samples of the Poisson 
series, and to recognise that the statistic supplies an index of dis¬ 
persion for sets of parallel plates by which their homogeneity may 
readily be examined. This new tool is then applied to examine the 
circumstances in which aberrant or exceptional counts have been 
found to arise, in data from Rothamsted and elsewhere. 


* Reprinted from Annals of Applied Biology, Vol. IX, No.s. 3 ami 4, pp. 325- 
359, 1922. 



4.325 


THE ACCURACY OF THE PLATING METHOD 
OF ESTIMATING THE DENSITY OF 
BACTERIAL POPULATIONS 

WITH PARTICULAR REFERENCE TO THE USE OF 
THORNTON’S AGAR MEDIUM WITH SOIL SAMPLES 

By R. a. fisher, M.A., H. G. THORNTON, B.A., 

AND W. A. MACKENZIE, B.Sc. 

(Rothamsted Experiment Station) 

(With 2 Text-figures) 

1. Introduction 

The accuracy of the estimates of bacterial density, in samples of soil, 
water, or other material, obtained by the plating method, is only one 
of many points which arise in the interpretation of bacterial counts. 
The full interpretation of such data would include a consideration 
of the divers species that occur on the culture media, and of the 
forms in which they exist in the soil. The partial or total exclusion of 
certain forms, such as anaerobes, that require special cultural con¬ 
ditions, must also be considered in a full examination of such data, for 
a single medium supplies, necessarily, but a single aspect, however 
comprehensive, of the bacterial flora of the soil. Questions too, as to 
what is to be considered as the unit of enumeration—the individual 
organism as it exists in the soil, or possibly groups of such organisms 
adhering to single particles of soil, and undetached by the processes of 
sampling and dilution—whatever their importance may be, are not the 
object of the present investigation. 

For if all these inquiries could be answered with certainty and pre¬ 
cision it would still remain to be discovered with what accuracy the 
numerical estimate of bacterial density, obtained from a single set of 
plates, represented the actual bacterial density in the sample, and in the 
material from which the sample was drawn. 

The question of accuracy, therefore, unlike the other elements in the 
interpretation of bacterial count data, is primarily a statistical question 



4.326 


326 Method of estimating Bacterial Density 

and may be thrown into the characteristic statistical form of the estima¬ 
tion of a population from a sample. Only in peculiarly favourable cases, 
however, as will be seen more clearly below, could we rely upon an 
a 'priori mathematical solution. 

2. The Plating Method 

The plate method of counting soil bacteria is an adaptation of the 
})late counting technique, developed by Koch in 1881, applied to the 
special conditions of soil bacteria. 

The process in general consists in making a suspension of a known 
mass of soil in a known volume of salt solution, and in diluting this 
suspension to a known degree. The bacterial numbers in this diluted 
suspension are estimated by plating a known volume in a nutrient gel 
medium and counting the colonies that develope on the plate. An 
estimate of the bacterial numbers in the original soil is then made by a 
simple calculation, the mass of soil taken and the degree of dilution being 
known. 

There are great variations in the details of the method as employed 
by various workers. These differences concern all the stages in the process 
and also the nature of the gel medium used in plating. An idea of the 
extent of this lack of standardisation may be gathered from a paper by 
Z. N. Wyant(i6) in which a number of the variations in technique used 
by different w’orkers has been collected from the literature. 

As an example illustrating the process, however, the technique used 
at Rothamsted and employed by Cutler in the bacterial couiit work 
discussed below, will be described. 

Ten grams of the soil sample are placed in 250 gm. of sterile saline 
solution and shaken for four minutes to obtain a suspension of the soil. 

1 c.c. of this suspension is placed in 99 c.c. of sterile saline solution and 
shaken for one minute to ensure a uniform distribution of the contained 
organisms. 1 c.c. of this second dilution is placed in another 99 c.c. of 
saline and shaken for one minute. 

Every cubic centimetre of this final dilution will then contain 
grams of the original soil sample. 

One c.c. of this dilution is then delivered into each of five petri dishes 
and mixed with an agar medium. After incubation the bacterial colonies 
on each plate are counted, and the mean of the five parallel counts taken. 
From this the bacterial numbers per gram of soil are estimated. 

The bacterial numbers obtained by the plating method do not repre¬ 
sent the total bacterial content of the soil. This is clear from the fact 



4.327 


R. A. Fishbr, H. G. Thornton, and W. A. Mackenzie 327 

that on no single medium will ail the physiological groups of soil bacteria 
develope. In using this method, however, it is hoped to obtain a standard 
of bacterial density by which two or more soil samples can be compared. 
To obtain this result from the method a careful standardisation of the 
whole technique is essential, in order that those sources of error that 
cannot at present be eliminated, such as the failure of some organisms 
to develope on the plates, may be rendered so uniform as to affect the 
count in a constant manner. 

This standardisation must comprise both (a) the manipulative portion 
of the technique involved in making the dilutions, and (6) the composition 
of the medium employed in plating. 

In applying results obtained by the method it is necessary to have an 
estimate of its degree of accuracy, and in order to improve it, some know¬ 
ledge must be obtained as to which stages in the process are the chief 
causes of the variation in results. 

For the results of the plating method to have their highest possible 
accuracy, very severe conditions would have to be fulfilled. An imaginary 
experiment will perhaps serve to make the conditions clear. 

If a 10 gni. sample of soil were diluted down to a dilution of 1 gni. in 
250,000 C.C., enough material would be provided for million plates. 
The result of such an experiment would be of the highest possible accuracy, 
if one could assume that 

(I) Each plate offers the same facilities for development. 

(II) The development of any organism is independent of other 
organisms present. 

(III) Development results in only one visible colony. 

Since in practice only a few plates are prepared, two additional con¬ 
ditions are involved in the sampling theory, 

(IV) Each plate has an equal chance of receiving any organism. 

(V) The organisms are distributed independently. 

The fulfilment of the first, fourth and fifth conditions depends upon 
the perfection of the technique employed. The second and third con¬ 
ditions depend definitely on the nature of the organisms, and are only 
matters of technique in so far as this term may be employed for the 
choice or elaboration of a medium upon which the organisms, which it 
is desired to study, fulfil those conditions, and which excludes the inter¬ 
ference of those which would fail to do so. 

These conditions can to some extent be tested independently. Thus, 
in a short experiment, where a single batch of medium is used, it is to 
be expected that the medium in each plate will offer the same facilities 



4.328 


328 Method of eaiioiaiing Bacterial Densitg 

for development (Condition 1). In a long experiment, however, where a 
number of different batches of medium are used, this will be the case 
only if the medium can be accurately reproduced, if, that is, different 
batches of medium, prepared independently, give significantly the same 
results. This reproducibility has been confirmed for Tliornton’s agar 
medium (Thornton, 1922(ii)). 

Again condition (IV) would fail if from any cause the dilution was 
carried out in an irregular manner. This may be tested directly by carrying 
through the whole dilution process independently with different portions 
of the same sample. The following experiment is an example of such 
a test. 

Four portions of a sample of Barnfield soil, simultaneously analysed 
by four different workers (Aug. 14, 1921), gave the following counts: 


Table I 

Portion 


Plate 

A 

B 

C 

D 

1 

20 

28 

31 

37 

2 

30 

33 

20 

32 

3 

30 

32 

28 

32 

4 

29 

20 

32 

30 

5 

32 

27 

31 

20 

Mean 

29-4 

29*2 

290 

31-4 


The four sets of plates are indistinguishable from random samples 
from a single population. The variance estimated as from a single sample 
of 20 is 8-52, actually less than the mean value for the variance within 
each set, 9-15. An equivalent test is provided by the correlation between 
different plates of the same set; this is —089i *108, negative and quite 
insignificant. In spite of the fact that the different plates of the same 
set agree very closely, the variation between the four means is quite 
insignificant. 

Table II 


Portion 


Plato 

1 

11 

111 

IV 

1 

72 

74 

78 

09 

2 

09 

72 

74 

07 

3 

03 

70 

70 

00 

4 

59 

09 

58 

64 

5 

59 

00 

58 

02 

0 

53 

58 

56 

58 

7 

51 

52 

56 

54 

Mean 

60-80 

06-86 

64-28 

62-86 



4.329 


R. A. Fisher, H. G. Thornton, and W. A. Mackenzie 329 

Equally close is the agreement between the sets of. seven plates pre¬ 
pared from four parallel series of dilutions (June 22, 1922), shown in 
Table II. No trace of differentiation is observable, and the four sets 
must be regarded as random samples from a single population. 

On certain occasions the same point is established by the analysis of 
simultaneous samples from the same field. An agreement in such cases 
shows the uniformity in bacterial density of the portion of the field 
sampled; it also serves to show that n^ significant differences are 
introduced by variations in the process of dilution. Thus four simul¬ 
taneous samples from Broadbalk (Aug. 14, 1921) gave the following 
counts. 

Table III 

Sample 


Plate 

I 

il 

111 

IV 

1 

38 

46 

43 

27 

2 

32 

40 

34 

41 

3 

,'■>2 

45 

62 

35 

4 

32 

31 

r>5 

30 

5 

40 

43 

38 

45 

Mean 

38-8 

40-8 

44-4 

30-8 


From the whole set of 20 the variance is 56«27, from the four sets 
of 5, 56*97, not a significantly greater value. The correlations between 
plates of the same group is + *014 ± *108, an insignificant positive value. 
By the most sensitive tests possible, no differentiation is observable. 

There is thus reason to claim that the manipulative technique can 
be so efficiently standardised that no significant variations in it are 
detectable, having regard to the variance that occurs between the colony 
numbers developing on parallel plates from a single final dilution. 

Our attention is thus drawn to this variance between parallel plates, 
which may be due solely to the chance distribution of organisms within 
the final dilution, or may in addition be influenced by the mutual 
interference between organisms on the plates, or by the failure of certain 
organisms to dovelope into single discrete colonies. 

It is therefore necessary, in interpreting the results of the counting 
technique, to discover the relative importance of these influences, on the 
colony numbers, and on the variance between them. It is on the experi¬ 
mental evidence as to the actual nature of this variance between parallel 
plates that our further conclusions will be based. 

Nevertheless, the two questions of the reproducibility of the medium 
and of the equivalence of results obtained by independent series of 



4.330 


330 Method of estimntiTig Bacterial Denisity 

dilutions made from a single sample, are here insisted upon, because 
failure in either of these two points would not necessarily affect the 
agreement between parallel platings, from the same final dilution, which 
is studied below. 


3. Thk Poisson Series 

It was shown by Poisson(i) in 1837, that if a large number of indi¬ 
viduals, iV, arc each exposed independently to a very small risk of an 
event of which the probabiUty of occurrence in any instance is p, then 
the number of occurrences, Xy in any trial wnll be distributed according 
to a definite law, sometimes called the Law of Small Numbers. The 
distribution of x is found to depend on a single parameter 

m — 'pNy 

in such a way that the probability that the number of occurrences shall 
be X is given by the formula 



It should be noted that x is always a whole number, while m may be 
fractional; the mean value of x is equal to m, and when m is large the 
distribution, except for its essential discontinuity, resembles a normal 
distribution, having its mean at in and the variance (the square of the 
standard deviation) also equal to m. 

The importance of the Poisson series in modern statistics was brought 
out by “Student”(2) in 1907^, in discussing the accuracy of counting 
yeast cells w'ith the haernocytometer. Since the chance of any given 
yeast cell settling upon any given square of the haernocytometer is 
extremely small, while the number of cells is correspondingly great, 
“Student” arrived independently at the Poisson formula, as a theoretical 
result under technically perfect conditions. He was able to show that, 
in some instances, counts of 400 squares agreed with the theoretical 

‘ The Poisson Series had been successfully applied by von Bortkiewicz to tlie annual 
number of deaths from horse-kick in a number of Prussian Army Corps(/<?). Miss Whitaker's 
criticism(5) of this application is entirely vitiated by her neglect of the variation of random 
samples. 

H. Bateman (1910)(i?) arrived at the formula for the Poisson Series, as the distribution 
of the number of a particles, emitted by a film of polonium, which strike a sensitised screen 
in successive equal intervals of time. The formula was used by Rutherford and Uoiger to 
test the indei>endcnce of simultaneous emissions. The distribution of 2608 counts showp 
a general agreement with expectation, though there are discrepancies not easily to be 
explained by chance. The observations are certainly not adequate, ns these authors suggest, 
as “a method of testing the laws of probability.” 



4.331 

R. A. Fishrr, H. G. Thornton, and W. A. Mackenzie 331 

distribution, and that when this is the case the accuracy of the count 
is known with precision and depends only on the number of cells counted^. 

The ideal conditions for bacterial counts made by the dilution 
method, are closely parallel to those found necessary in the case of the 
haemocytometer. The chief practical difference lies in the fact that 
instead of 400 squares with only a few yeast cells in each, we have some 
five plates with perhaps 2(K) colonies apiece. The agreement of the 
results with the theoretical distribution cannot, therefore, be demon¬ 
strated from a single count. Under ideal conditions the data would 
consist of a number of small samples from different Poisson series. For 
this reason as soon as it was suspected that this ideal condition might 
have been realized in practice, a special investigation of the nature of 
such samples was undertaken, owing to the importance of demonstrating 
the substantial fulfilment of the severe conditions laid down in the 
previous section. 

4, Preliminary Reduction of Cutler’s Data 

When the question of the accuracy of the bacterial counting technique 
was discussed between the present authors in the spring of 1921, it was 
decided that the daily observations of bacterial numbers then being 
carried out at Rothamsted by Cutler would afford a valuable opportunity 
of studying the variance between parallel plates and its causes. In this 
choice our investigation was more than fortunate, for no other series of 
bacterial counts known to us, of which many have been examined, would 
have gone so far in clearing up the obscurities of the subject. 

In conjunction with daily estimations of soil protozoa carried out at 
Rothamsted from July 1920, daily counts of bacteria were also made in 
the protozoological laboratory (Cutler (17)). The dilution technique used in 
this work has been described above. Plates were incubated at 18° C., 
and counted after five and seven days, the seven day counts only are 
considered here. Throughout the work the agar medium recently 
elaborated by Thornton (ii) was used. The data thus supply an extensive 
test of this medium under routine conditions. 

When the statistical examination of these data was commenced it 
was not anticipated that any clear relationship with the Poisson dis¬ 
tribution would be obtained; the reduction was designed to determine 
empirically the relation between the mean bacterial number calculated 
from any set of plates, and the variability of that set about the mean. 
Knowing this relation, a probable error could be assigned to each value. 

• Valuable tables of the Poisson Series have been prepared by H. K. Soper(7). 



4.332 


332 Method of estimating liacteirial Density 

Two statistics were calculated from each set of plates. If x stand for 
the number of colonies on each plate, and n for the number of plates, 
the necessary statistics were: 

the rman x ^ S (x)^ 

n ' 

and the variance v = ^ S (x ~ 

n — 1 ’ 

w’licre /S stands for summation. 

The values of being the estimates of the variance from small 
samples, were inevitably affected by large sampling errors, which 
depended upon the number of plates. The w'hole body of four-plate sets 
was therefore divided into groups, according to the value of x. Thus for 
the two groups of four-plate sets having a mean number of colonies 
65-75 and 75-85, the following values of v were obtained: 


Table IV Table V 



65 75 



76-85 


Sft No. 

X 

V 

Set No. 

X 

V 

29 

69-75 

05-58 

59 

77-00 

78-00 

.33 

73-50 

27-00 

89 

76-75 

142-91 

.Tl 

08-75 

r'312-25 1 

97 

84-75 

144-25 

i\0 

71-50 

1 401-67 1 

105 

84-50 

56-33 

12H 

73-75 

60-91 

149 

79.50 

77-07 

194 

72-75 

146-25 

109 

84-.50 

12.3-07 

227 

()7-50 

27-07 

240 

82-25 

8-91 

241 

08-75 

8-91 

273 

84-50 

48-33 

249 

07-25 

7-.58 

301 

84-25 

73-91 

2(i3 

73-25 

I I2-.58 

Mean 


83-78 

272 

72-75 

52 91 




:i.30 

7000 

.55-33 





Mt aii of M 

Mean of 10 5(i-47 

Two facts are apparent from these results (1) the variability of v is 
so great that accurate values are not obtained from the means of about 
10 values; (2) the difficulty of estimating the variance for given values 
of X is still further increased by the occurrence of occasional very large 
values of v. The values of v in sets 51 and GO in Table IV are much 
greater than the other 10 values in the same group. The values of the 
means obtained by excluding and including these high values are given 
at the foot of the table. 

The first difficulty could be overcome by fitting to the actual values 
obtained a smooth curve representing the mean v for given x] before 



4.333 


R. A. Fisiiek, H. G. Thornton, and W. A. Mackenzie 333 

doing 80 , however, it was thought advisable to exclude as far as possible 
the exceptional large values. As a rough criterion it was decided to 
exclude those values which exceeded by more than threefold the mean 
value of the group. In the larger groups this criterion acted well; in the 
smaller groups, such as occurred for high and low values of x, it was 
necessarily inconclusive, even when account was taken of neighbouring 
groups. The curve fitting was therefore confined to the region in which 
the data appeared to be sufficiently abundant. 



Curves of the form v = Ax + Bx^ 

(where A and B are two constants determined from the data) were fitted 
to the four-plate data from x = 0 to « = 180, and to the five-plate data 
from 0 to 160; the curves obtained are shown in Fig. 1. 

The straight line, v = x, represents the relation between the variance 
and the mean in the Poisson Series. The curves evidently tend to cling 
closely to this line, especially in the region (60-120) where the data are 
most abundant. The curves strongly suggested that the departures in 




4.334 


334 Method of estiniating BacteHal De'iisity 

these data from the Poisson samples were not, as had been expected, 
systenuitiCf but were due to tljc sjjoradic occurrence of exceptional sets; 
the curvature iji the smooth curves being perhaps largely due to the 
crudity of the criterion employed in excluding the exceptions. This view 
impressed the authors with the necessity of studying the distribution of 
small random samples from the Poisson Series, with the double object 
of devising a valid criterion for the recognition of exceptions, and of 
testing accurately whether or not the remainder were in reality such 
random samples. 


5. Small Sampi.es of the Poisson Series 

The study of small samples, essential as it is to the development of 
adequate statistical methods, has hitherto been practically confined to 
the normal curve and surface. The following investigation may serve to 
show, that by taking account of the fundamental properties of those 
statistics which are derived by the method of Maxim\im Likelihood, the 
sampling problems of even discontinuous distributions admit of material 
simplification. 

In a sample from a Poisson Series, the chance of any observation 
having the value of x is 


where ni is the parameter of the series. 

Hence the chance of observing a given series of values .Ca ... is 


If we estimate tn from such a sample by the method of maximum 
likelihood, wo have 

(•‘’S 'V)-" + “ 0 , 


so that X is tlie most likely value of in, and in consequence, as Fisher has 
recently shown (3), it may satisfy the criterion of sufficiency, in wliich 
case the distribution of any other statistic, for a given value of Xy must 
be independent of m. 

That this is so may bo proved directly; for 
may be put into the form 

--»m _(»»?) <_ 

(na;)I ‘ n"’'z,! ! ... x^\’ 



4.335 


R. A. Fisher, H. Q . Thornton, and W. A. Mackenzie 336 

the first factor represents the chance of obtaining a given value of x, 
and the second, which does not involve m, gives the chance that the 
sample shall show any particular partition of the total, once the total 
is fixed. The distribution of any statistic which depends upon this 
partition, must therefore be independent of m, once x is fixed. The 
problem of the distribution of v is therefore susceptible of the great 
simplification, that we need only consider its distribution for given values 
of ;/•, and that this distribution is wholly independent of m. 

The distribution of this, or any other, statistic, which depends upon 
a partition of an integer, must necessarily be discontinuous; when, 
however, .r is large, even for small values of n, the number of possible 
values of v becomes sufficiently great for its distribution to be represented 
by a frequency curve. This procedure is the more advantageous in that, 
by the choice of a new statistic, which shall replace we can throw the 
distribution into a form independent of Zr, whereas the actual partitions 
])ossible in the neighbourhood of equipartition, will necessarily change 
with the fractional part of .r. 

The frequency with which any given partition of the total, n.r, occurs, 
is in fact the frequency with which any given series of values are obtained 
when the total is distributed at random into n cells, the expectation 
in each being x. It is well known that when this is the case, the 
statistic 

X* = I -s (x - *)» = (n - 1) I 

measures the departure of the sample from equipartition, being equi¬ 
valent mathematically to Pearson’s test of agreement between observa¬ 
tion and expectation. The distribution of well represented by a 

smooth curve independent of x of the form (Pearson’s Type 3) 

n-H 

df = —~ e~*df, 

2 * 

and the frequency with which x^ exceeds successive integral values, has 
been tabulated by Elderton (4, 1902 and 5, 1914) for values of n from 
0 to 30. 

We are therefore in a position to test whether the conditions which 
lead to the Poisson Series are in fact fulfilled in any given body of 
bacterial data for which the counts on individual plates are known; it 
is only necessary to calculate the above index of dispersion (x*) from each 
set of parallel plates, and to determine whether the distribution of this 



4.336 


336 Method of estimating Bacterial Densitg 

index is or is not in accordance with the distribution predicted from 
Elderton’s tables, when 

= (*-•'•)* 

and n' = n. 

The statistic thus supplies an index of dispersion for sets of parallel 
plates. If the bacterial counts conform to the Poisson distribution the 
average value of will be one less than the number of plates. For 
sufficiently numerous sets of platen the agreement may be tested more 
exactly by the use of Elderton's Tables. 

G. Thk Index of Dispersion applied to Cutler’s Data 
The values of obtained from the sets of four parallel plates, grouped 
according to the value of the mean, are shown in Table VI. 

Table VI 


X* 


.7’ 

•5 

1 "5 

2 5 

3-5 

4-5 

5-5 

0-5 

7-5 

8-5 

0 5 

10-5 

> 11 

Total j 

20 

2 




„ 

__ 


__ 


_ 


_ 

2 i 

HO 

1 

_ 

_ 

_ 


_ 


_ 


— 

.. 

— 

1 1 

•10 

2 

—. 

2 

... 

2 

1 

— 


— 


— 



50 

— 

1 

_2 

5 

2 

1 

1 

— 


— 


— 

12 

00 

5 

3 

•> 

— 


— 

- 



__ 

- 

12-3 


70 

3 

2 

4 

— 

1 


1 

— 


— 

— 

130, 100 

13 ! 

80 

1 

1-5 

2-5 

1 

1 

2 


— 

— 

— 

— 

— 

0 I 

00 

2 

3 

1 

3 

1 


— 


— 

1 

— 

- 

n i 

100 

1 

3 

3*5 

— 

- 

-- 

-- 

1 

1 

— 


14-8, 24-5 

11 -5 

110 


2 5 

1 

1 


— 

— 


-- 

— 

— 


4-5 

120 

2 


— 

1 


- 

1 

— 

— 

— 

— 

151 

5 

IHO 

— 

— 

1 

••7 


1 

- 

— 


— 

1 

14-2 

4-5 

140 



»> 

1*5 

1 

1 


- 



— 

— - 

5 5 

150 

.3 

1 


1 


- 

— 

-- 

— 

— 

— 

— 

10 

ICO 

0 

1 

— 

— 


— 


— 


— 


— 

7 

170 

1 


— 

1 

— 


_ — 

- 

1 



17-5 

4 

180 

4 

1 

1 


— 


— 


1 


1 

240, 130 

10 

100 

1 

_ 


_ 


_ 


_ 

- 

_ 


— 

1-5 

200 

2 

1 

•5 


— 

- 

— 

— 


1 

— 

1.7-8 

5-5 

210 

1 

1 


1 

— 


— 

- - 

— 

- — 

— 

-- 

3 

220 

_ 

— 

1 

— 

— 

— 


1 


— 


12-2, 10 8 

4 

2.30 

— 


— 

— 

1 


1 


— 


— 

21-4 

3 

240 


— 

1 

— 

2 

1 

— 

— 

— 

— 

— 

— 

4 

250 

1 


_ 


_ 

— 


—. 


— 


— 

1 

200 

1 


— 

- 

— 

— 

— 

— 

— 


— 

11-4 

2 

270 

— 

1 

— 



-- 

— 

-- 

— 

— 

— 

20-1 

o 

1 .30 


30 

10 

11 

0 

5 

4 

.3 

2 


10 

1.70 


No obvious relationships are observable between the value of and 
that of Jc. There is indeed an excess of the exceptionally large values of 




4.337 


K. A. Fisher, H. G. Thornton, and W. A. Mackenzie 337 

X* (> 11) among the higher values of S, but this on investigation proved 
to be completely accounted for by the epidemic character of the occur¬ 
rences of these large values, which we shall demonstrate below (see Fig. 2). 
The longest and most severe epidemic occurred during a period (Oct.- 
Dec.) when the bacterial numbers were generally high. Within this 
period no sensible association is apparent. 

Confining attention therefore to the distribution of irrespective of 
the mean number of colonies counted, it is clear that the sets with 
exceptionally large variations, which interfered with the preliminary 
reduction of the data, are now distinguishable as those with high values 
of x*- If the sets were random samples of Poisson Series, it appears from 
Elderton’s Tables that only 3 per cent, of the observed values should 
exceed 9. It is clear that there is here a group which must be excluded 
in considering the agreement of the remainder with the theoretical 
distribution. If this were the only irregularity in the observed numbers 
we should therefore compare them with a theoretical series having the 
same total below 9. As it is there is also some irregularity visible at the 
beginning of the series, suggesting that there is also an excess of unduly 
small values of x*- For this reason we shall base our comparison on the 
total observed between 1 and 9, as is shown in Table VII. 

Table VJI 

Comparison of observed and expected distribution of x^, \-plate data. 



Exx>«t;tod 

m 

Obnerved 
m + a- 

Difft*rence 

X 

1 

■5 

24-97 

39 

+ 

14-03 


1-6 

28-76 

22 


6-76 

1 -589 

2-5 

22-72 

30 


7-28 

2 333 

,3*5 

16-36 

16 

_ 

-.36 

•008) 

45 

11-27 

11 


■27 

■006 V 136 

5-5 

7-56 

6 

— 

1 56 

•322) 

6-5 

4 99 

5 

+ 

•01 

•0001 

7-5 

3-25 

4 

4- 

■75 

•173^ -266 

8-5 

2-10 

3 

+ 

•90 

-.386) 

over 9 

3-68 

20 


X’ 

* - 4-817, 4-324 

Total 

125-66 

1.56 


/ 

»= -682, -232 


Within the range from 1 to 9, the agreement of the observed with the 
expected values is striking. When tested in eight groups, the probability 
of obtaining a worse fit by chance from perfectly normal data is *682, 



4.338 


338 Method of esiimathig Bacterial DenmUj 

and even when grouped in the most unfavourable manner, by throwing 
together consecutive positive and negative residuals, a method suggested 
by Mr Udny Yule, the probability is still *232. There is therefore no 
significant deviation of those values from expectation. 

Of those above 9, we may anticipate that some three or four will be 
normal values and the remainder exceptions. It is of course impossible 
to separate these with absolute certainty. In discussing the evidence for 
epidemics we shall assume that the four values below 11 are normal and 
that the remainder are exceptions. When, however, the fact of the 
epidemic incidence of those exceptional values is taken into account, it 
appears that the two between 10 and 11 are among the relatively few 
“ normal” sets occurring in an epidemic period and are therefore probably 
exceptions, while the two between 9 .and 10, and possibly also the value 
at 11*4, are for the same reason probably normal. 

It is thus possible to separate this class of exceptions from the 
remaining data with some degree of certainty and to study them 
individually, but this is not possible for the exceptionally invariable sets. 
All that we can do here is to show that the evidence for their real 
existence is stronger than appears in Table VII. If wc subdivide the 
region of the first two groups of that table somewhat more closely we 
obtain 

Table VIII. 


X 

0 

75 

■95 

M5 

Kxj)ert4'd 

Observed 

11-82 

9-97 

21 

12 

12'5e 

17 

14-15 

9 

1 35 




the excess of numbers is most clearly marked in the group of smallest 
values, and is possibly though not certainly confined to the region. 

These conclusions arc independently confirmed by the sets of five 
parallel plates. In Table IX is shown a comparison of the observed 
distribution with that expected, on the basis of the total observed 
between 2 and 11. 

The agreement with expectation in the range from 2 to 11 is perfectly 
satisfactory; when tested in the 9 unit groups, the possibility of obtaining 




4.339 


R. A. Fishbr, H. G. Thornton, and W. A. Mackenzie 339 

a worse fit by chance from normal data is *765. Grouping together the 
consecutive positive and negative errors, it only falls to *571. There is 
again no significant deviation of the distribution in this range from 
expectation. 

Table IX 

Comparison of observed and expected distribution of Ti-plale data 


X* 

Expected 

ObBorved 

Difference 

.r* 


m 

m + x 

X 

m 

-.1 

10-94 

25 

-f 14-06 


1-5 

21-10 

27 

+ 5-90 


2-5 

21-58 

24 

+ 2-02 



18-41 

20 

+ I-.59 

4 5 

14-:19 

12 

- 2-39 

•397 

.*>•5 

10-09 

11 

+ -31 


6*5 

7-67 

9 

+ 1-33 

7-5 

5 37 

5 

- -37 


8-5 

3-70 

0 

- 3-70 

O-.'i 

2-51 

3 

4- *49 


10-6 

1-68 

2 

-4* -32 

over 11 

3-22 

18 

X* 

'=4-927, 2-938 

Total 

121-26 

156 

P 

•= -765, -571 


Of the values above 11, three lie between 12 and 13, and in discussing 
the evidence for epidemics we shall assume that these are normal sets, 
and that all those above 13 are exceptions. When we take the evidence 
of epidemic incidence into account, it is found that the only four sets 
above 13 which might reasonably be considered normal all occur in 
epidemic periods, and that the same is true of one out of the three between 
12 and 13. This therefore (No. 160, see Fig. 2) is probably also an 
exception. 

The conclusions to be dmwn from the 4-plate and from the 5-plate 
data, thus confirm each other at every point. In both groups the sets 
having exceptionally high variability may be identified in almost every 
case with certainty. The majority of both groups, about 124 of the 
4-plate sets, and about 117 of the 5-plate sets, are evidently true samples 
of the Poisson Series. Both groups show an excess of cases of small 
variability, but it is not possible to specify the actual sets affected by 
this; it is evident that this cause, like that which produces high varia¬ 
bility, is sporadic and not systematic in its action; it affects a certain 
number of sets in a definite manner, leaving the majority unaffected. 
This effect, whatever be its nature, is more clearly brought out in the 





4.340 


340 Method of estimating Bacterial Density 

5-platc than in the 4-plate sets, possibly because the sets of five plates 
make possible a closer scrutiny into the exactitude of the agreement 
between the observed sets, and samples from a Poisson Series. 

For the same reason the 50 sets of three plates cannot be expected 
to provide much additional information. The seven exceptionally high 
values stand out perfectly clearly; the lowest is 9*2, a value which would 
be exceeded by only one normal sample (of 3) in 100. The next highest 
values 5-4 and G-4, would not be suspect save for their occurrence in 
December; they will be treated as normal. 

{^ince the 3-plate sets are relatively scanty, we can best test their 
agreement with theory by dividing the theoretical distribution of 43 
values at its quintiles, so that the expectation is the same in each group. 
We then have 


Table X. Sets of three plates 


X 

2 = 1-77 

P = -775 



Expecttnl 

m 

Observed 

w -f-x 

X* 




8-« 

8 

•36 

•44(U 

1 .AO 1 i\ 

8-6 

a 

6-76 

1 'UJiO 

1 8;i20 

*1 O 1 <1/1 



5-76 

8*8 

8 

•36 


8fi 

10 

1-06 

Total 

43 

43 

15-20 


The agreement with expectation is excellent, and the sets of three 
plates bear out tlie conclusions derived from the sets of four and five 
plates, save that here there is no visible excess of low values of 

It appears therefore that out of the 3(>2 sets of plates examined the 
majority represent true samples from the Poisson Series, such as would 
be the case if the biological and technical difficulties of the bacterial 
count method as applied to soil had been completely surmounted. Forty 
sets, which can be identified almost with certainty, are affected by some 
cause or causes which greatly increase the variability between the plates, 
while probably a smaller number, including apparently none of the 
3-plate sets, are affected by a second cause of error, which reduces the 
variability between the plates. 




4.341 


R. A. Fisher, H. G. Thornton, and W. A. Mackenzie 341 

7. The exceptionally Variable Sets in Cutler’s Data 

The records of the exceptionally variable sets of plates which occurred 
in Cutler’s data, when identified by the method of the proceeding section, 
were studied individually with a view to gaining light upon the cause of 
their occurrence. As it is not necessary to reproduce the whole of the 
statistical tests which were applied, we shall confine ourselves to the 
main facts which emerged, and which served to justify the previous 
conclusions, as well as to indicate the nature of the disturbing cause. 

The following facts appear to be unquestionable:—• 

(1) The proportion of exceptionally variable sets is the same for the 
sets of three, four and five plates in each portion of the total period. 

(2) The proportion of exceptionally variable sets varies greatly at 
different periods, the exceptions occurring in well marked epidemics. 

The evidence for these statements may be put in the form of a triple 
contingency table (see Fig. 2) 

Table XJ 




Excessive 

variable 

ly 

Not excessively 
variable 


Total 



Period 

5 

4 

3 

3’otal 

5 

4 

3 

Total 

5 

4 

3 

Total 


1 

1 

_ 

1 

2 

0 

9 

9 

27 

10 

9 

10 

29 

•967 

2 

3 

2 

1 

0 

7 

12 

6 

2r> 

10 

14 

7 

31 

1-728 

3 

— 

— 

1 

1 

12 

18 

4 

34 

12 

18 

5 

35 

6-176 

4 

3 

(10) 

9 

1 

(14) 

13 


(11) 

12 

4 

(20) 

21 

8 

21 

5 

34 

•818 

r> 

(0) 

r> 

(•"O 

4 

1 

(12) 

10 

(0) 

7 

(14) 

ir> 

4 

(24) 

26 

12 

19 

5 

36 

1-733 

f> 

— 

— 

— 

— 

19 

18 

— 

37 

19 

18 

— 

37 

— 

7 

— 

— 

— 

— 

22 

13 

1 

36 

22 

13 

1 

36 

— 

8 

— 

— 

1 

1 

20 

11 

5 

36 

20 

11 

6 

37 

5 310 

9 

1 

— 

_ 

1 

17 

12 

6 

3.5 

18 

12 

6 

36 

1029 

10 

2 

1 

1 

4 

23 

20 

4 

47 

25 

21 

5 

51 

1-299 

Total 

(IB) 

15 

(18) 

16 

7 

(41) 

38 

(140) 

141 

(1,38) 

140 

43 

(321) 

324 

156 

156 

50 

362 

19-060 


in which the whole of the 362 observations are divided, 

(1) according to the number of plates observed, 

(2) in ten periods of time of alternately 36 and 37 days, into which 
the year was divided, 

(3) according as they are judged to be exceptionally variable, or not, 
solely upon the evidence of the index. The subdivision which would be 
made taking also into account the evidence for epidemics is shown in 
brackets, but in discussing the evidence for epidemics these modifications 
are ignored. 




4.342 


342 Method of estimating Bacterial Density 

To test the first point, each line of Table XI is treated as a 2 x 3 
contingency table, and the value of calculated from it. It has been 
shown (Fisher, 1922(6)) that as in such a table there are tw’o degrees of 
freedom, will be distributed, if there is no association, lis in Eldcrtoii’s 
Tables when n' - 3. To show that at no period is there significant 
association, the values of for the 10 periods are added, and the resulting 
quantity should be distributed as in Elderton’s Tables when n' ~ 21. 
Since in two consecutive periods jio exceptionally variable sets occurred, 
these periods have been omitted, and n' is taken to be 17. It will be seen 
from the table that all the values of are less tlian 2, except in two 
periods in which only a single exceptionally variable set occurred. Such 
cases are evidently beyond the range of effective application of the 
test, but even including these high values, P — *266, and therefore there 
is no significant departure from the rule that sets of three, four and five 
plates show equal proportions of exceptions in all sections of the period 
of observations. 

This fact confirms the justness of the criterion by which the exceptions 
have been identified, for any error in the method of identification would 
naturally show itself in the proportion of cases regarded as exceptions; 
in the second place it indicates that the cause of exceptional variability 
is not connected with the causes which lead to the rejection of individual 
plates (contamination, development of fungi or overgro\vth by B. 
dendroides), and in the third place it shows that the exceptions are not 
caused by the exceptional deviation of a single plate, for in this case the 
proportion of o-plate sets would necessarily be highest. The third 
conclusion is borne out by an examination of the numbers counted on 
individual plates, and both it and the second conclusion are more 
decisively drawn from the contingency table by ignoring the period of 
occurrences. 


Table XII 


No. <if plates ... 

.*!> 

4 

.3 

Total 

Kxceptionally variable 

16 

18 

7 

41 

Not exeeptionally variable 

140 

138 

43 


Total 

ir>6 

156 

50 ! 

362 


The numbers in the smaller groups are here sufficient to make a 
satisfactory test, and the value of P, *739, shows distinctly that there is 




4.343 


Cutler’s Data 

Sets of plates for each day of the year 



exceptionally variable 


NOT EXCEPTIONALLY VARIABLE 
r» Plati- - 


Fig. 2. 






4.344 


344 Method of estimattng Bacterial Density 

no significant difference in the proportion of exceptions between the 
several groups of observations. 

Similarly the distribution of the exceptions in time, in which we have 
shown the different groups to agree, may be beat shown by taking the 
totals, irrespective of the number of plates in each set. If this is done 
we have a 2 x 10 contingency table, of which the value of proves to 
be 57*820. 

Since n' = 10, the chance of such a distribution occurring under 
conditions of random occurrence in time is about 4 x 10“®. It is indeed 
obvious from inspection of Fig. 2 that the exceptional values occur in 
groups together, although perfectly normal values continue to occur 
throughout the worst of these epidemics. During the first outbreak 
seven exceptions occurred with 14 normal values among them; the second 
epidemic period was more prolonged and included 27 exceptions and 
46 normal values. In the second half year of the experiment only six 
exceptions occurred, of these two occurred on the same day (355) during 
the last fortnight, when duplicates were taken, and two others, 338 and 
340, were but two days apart. 

Bearing these points in mind, we have no hesitation in concluding, 
on purely statistical evidence, that the exceptionally variable sets of 
platings were due to two causes:—(a) a predisposing cause which is at 
work throughout the epidemic period, and (6) some additional circum¬ 
stance, in the absence of which the counts obtained will still be normal. 

8. Special Organisms which affect the Number of 
Colonies developing 

In the daily counts above considered, a uniform technique was 
followed throughout, and fresh batches of medium were made up at 
frequent intervals. It is conceivable that occasional differences in plating 
technique, in the medium, or in counting the plates may by chance have 
occurred on certain days. It is however most unlikely that any such 
differences can have extended over the long periods covered by the 
epidemics of high variance, without the fact being noticed. In seeking 
a predisposing cause of variance, covering these periods, therefore, one’s 
attention is naturally drawn to possible changes in the soil itself or in 
its population. 

It is known that certain micro-organisms, when growing on the 
medium, exert an inhibitory action on the development of colonies by 
other forms. The appearance of such an organism in the soil population, 
during certain periods, might therefore give rise to periods of higher 



4.345 


R. A. Fisher, H. G. Thornton, and W. A. Mackenzie 345 

variation between parallel plates, for unless present in very large numbers 
it would not appear on all the plates or even in every batch of five plates. 

An example of high variation between parallel plates, that was 
actually traced to such an organism, is given to illustrate this cause of 
inaccuracy. 

The soil used in this case was from the Leeds Experimental Farm, 
and had received a treatment of naphthalene. Thirty parallel platings of 
this soil were made on Thornton's agar. The counts of colonies on these 
plates are given in Table XIll. 

Table XIII 


Parallel plates of I^eeds soil 


Plate 

No. 

Number of 
colonies 

Plate 

No. 

Number of 
colonics 



240 


16 

126 

2 


209 


17 

126 

.3 


177 


18 

126 

4 


158 


19 

121 

5 

157 

20 

120 

0 


154 

21 

119 

7 

151 

22 

118 

8 

137 

23 

117 

0 

1.36 

24 

114 

10 

132 

25 

113 

11 

131 

26 

109 

12 

131 

27 

99 

13 

130 

28 

01 


14 

128 

29 

91 


Ifi 

127 

30 

87 







X* Index. Whole series = 230*17 

Minus the italicised plates = 27*81 


It will be seen that the variation between parallel plates in the whole 
series is excessive. In examining the plates, some were found to contain 
an organism forming a growth between the agar and the bottom of the 
dish. This organism occurred on the plates italicised in Table XIII. 
It is a motile organism and apparently spreads in the water film under¬ 
lying the agar. On plates 28, 29 and 30, the growth of this organism 
w'as sheet-like and from the low counts obtained it would appear that 
its growth has reduced colony development. On plates 1, 2, 3, 4 and 6, 
it has produced a number of separate colonies underlying the agar. 
These colonies were probably produced by individuals which had multi¬ 
plied and migrated along the bottom of the dish after the agar had set, 




4.346 


346 Method of estimating Bacterial Density 

but could not be separated from other colonies in counting the plate. 
The counts on these plates are therefore excessive. The presence of this 
organism on the bottom of the plates has thus produced an abnormal 
variation in the whole scries. It will be seen that, if plates on which it 
occurs are ignored, the index for the remaining 22 plates falls within 
the expectation of random sampling. 

A pure culture of this organism was obtained and a plating from a 
sample of Kothamsted soil vv^as made, a small loopful of suspension of 
the organism being added to the first dilution flask. Table XIV, Series A, 
shows the colonies developing on six parallel plates of the soil thus 
treated, compared with a control series of plates of the same soil not 
inoculated. Series B, which were made at the same time. 

Table XIV 

Effect of Leeds soil organism on colony development from suspension of 

Rothatnsted soil 


Serii'M A. Suspension inoculated 

Series B. Control | 

Plato 

Number of 

Area of bottom 

l*latc 

Number of 

No. 

colonics 

spreading 

No. 

colonies 

1 

8.5 

nil 

1 

95 

2 

79 

nil 

2 

90 

3 

78 

nil 

3 

80 

4 

70 

nil 

4 

85 

5 

.58 

nil 

5 

85 

G 

00 

2’2.5 sq. cms. 

6 

82 

7 

.50 

7*7,5 

7 

81 

« 

45 

27 0 

8 

77 

9 

41 

54 5 

9 

73 

10 

39 

.565 



X* Index, Plates 1 to 5 5*86 

Plates 1 to 10 = 4001 

X® Imlex = 1*89 


In this case the organism formed a spreading growth over the bottom. 
The area of this spreading growth, where it occurred, was measured and 
is shown in Table XIV. It will be seen that the reduction in colony 
development is clearly related to the amount of spreading growth. In 
this series of plates it is also evident that the variation is greatly increased 
by the occurrence of the organism on certain of the plates. 

From an abnormally variable series of plates of Rothamsted soil a 
second organism has been isolated, whose frequent habit it is to spread 
on the under surface of the agar, and which has a similar inhibitory 
action on the development of other colonies. Table XV shows two sets 






4.347 

R. A. Fisher, H. G. Thornton, and W. A. Mackenzie 347 

of plates of a suspension of Rothamsted soil, one set of which was 
inoculated with this organism. The reduction of, and increased variation 
in colony numbers are again well seen. 

Table XV 

Effect of toxic organism from Rothamsted soil on colony development 
from a soil suspension 


Series A 

Scries B 

Plates inoculated 

Control 

Plate 

Number of 

Plato 

Number of 

No. 

colonies 

No. 

colonies 

1 

192 

1 

179 

2 

168 

2 

171 

3 

147 

3 

108 

4 

130 

4 

150 

r» 

127 

.5 

150 

tt 

113 




Moan 146‘1 


Mean 163*0 

X* index = 29*47 

X*lndex= 4*17 


It is of course impossible to decide, with certainty, from a simple 
record of colony numbers, whether the presence in the soil of some such 
organism was the cause of the epidemics of variable plate-sets in Cutler’s 
series. However, the above two cases of high variance between parallel 
plates, which have been traced to the presence of definite organisms, 
show that this factor, though apparently of infrequent occurrence, is 
capable of causing a disturbance in the colony numbers of precisely the 
kind actually observed. It is important to notice that this, probably like 
all other causes, that produce a sensible departure from the Poisson 
Series, seriously disturbs the mean value. 

9. The Occurrence op Subnormal Variation 
It has been shown that in a small proportion (about 34 cases) of 
Cutler’s data, the variation between parallel plates has been apparently 
lowered by some disturbing agency. The same phenomenon in a much 
aggravated form appears in Owen’s data (section 10), and has from time 
to time occurred in Thornton’s work. For example the 20 plates shown 
in Table I display an unduly low variation, and though this fact does 
not detract from the value of the data in proving the equivalence of 
parallel dilutions, it does throw suspicion on the value of the mean as 
an estimate of bacterial density. A similar depression appears in Table 
XIV, Series B. 






4.348 


348 Method of estimating Bacterial Density 

Unlike the excessively variable sets, the sets with subnormal variance 
cannot be identified individually in Cutler’s data, and we have therefore 
less evidence upon which to put forward a biological explanation of the 
phenomenon; certain farts, however, concerning observations made in 
the course of 1921, suggest that additional precautions in the preparation 
of the medium, may be effective in eliminating the disturbing cause. 

The additional data were accumulated in the Bacteriological Depart¬ 
ment^ in the summer and autumn of 1921 in the course of some work 
on the relationship of bacterial numbers to nitrate content in the field 
soil. In each of these experiments a series of some 45 samples of soil 
were taken from a plot 9 by 15 feet in area and the bacterial numbers 
in each sample estimated by the plate method using Thornton’s agar 
medium. The first experiment was carried out with the dunged plot 
in Barnfield. The technique used was similar to that employed in 
Cutler’s work, five parallel platings being made of each sample and the 
colonies counted after an incubation of seven days at 20° C. 

Of the 33 sets available, three show excessive variance, the remainder 
are distributed as in Table XVT. 


Table XVI 

308 P = -381 


X* 

5-platc 

4-plate 

3-plate 

'J'otal 


Expected 

x^jm 

•5 


_ 

1 

1 

i 

9-78 

79 

1-5 

4 

2 

— 

6 




2 5 

2 

— 

— 

2 

f 

9 39 

•21 

3 5 

4 

2 


0 

i 



4-5 

4 

— 

1 

5 

1 

5'5« 

37 

5-5 


2 

— 

2 

i 



6-5 

3 

1 

— 

4 

) 

1 



7-5 

3 

— 

— 

3 

506 

1-71 

8-5 

1 

— 

— 

1 



Total 

21 

7 

2 

30 

3 08 


It will be seen that these agree well with the Poisson Series, and show 
no sign of subnormal variation. 

A second experiment was carried out at Kingsthorpe Hall, Nor¬ 
thampton. The soil is here of a markedly different type from the heavy 
Rothamsted soil, being a light ferruginous loam. In this experiment the 
technique was varied in that the colonies on each plate were counted 
twice, after seven and twelve days’ incubation. It will be sufficient to 
compare the observed and expected values of the total, S (x^), for different 
groups of plates. 

1 'J'he authors wish to acknowledge their indebtedness for the assistance rendered by 
other Departments at Hothamsted in this work. 







4.349 


R. A. Fisher, H. 6. Thornton, and W. A. Mackenzie 349 

Table XVII 


Number 
of plates 
per set 

Medium 

After 7 days 

After 12 days 

Expected 

Observed 

Expected 

Observed 

4 

A 

IH 

1.3*85 

24 

27-31 

a 

A 

152 

109-33 

144 

1.33 96 

9 

A 

8 

1-90 

8 

8-73 

Total 

A 

178 

12.5-14 

170 

170-00 

20 

B 

19 

10-45 

19 

25-34 


In all these groups where medium A is used the variance is distinctly 
subnormal after 7 days, but is apparently normal after 12 days. With 
medium B, the variance is normal at both counts. Now the sets of 9 and 
of 20 plates were parallel dilutions of the same sample, and the mean 
count from medium A was only 75 per cent, of that obtained on medium 
B. The abnormality of medium A was afterwards traced to the tempera¬ 
ture at which it was filtered, a technical detail which has an important 
bearing on the ability of the medium to support bacterial growth 
(Thornton, 1922 (ii)). 

In the comparison given by Thornton (ii) of the two batches of 
medium, identical save that one was filtered at 50® C. and the other at 
100 ® C., 10 plates being prepared from each, the former gave a mean 
count 79 per cent, of the latter; in this case also the defective medium 
showed subnormal variance giving a value *^'2 (after eight days), 
whereas the normal medium gave a value 10*3. The former would only 
occur once in 22 trials by chance, and therefore represents clearly a 
subnormal condition. 

Whatever the biological explanation of subnormal variance may be, 
it is therefore sometimes indicative of a serious error in the value of the 
mean. In this respect it is a danger signal which cannot be disregarded. 
When a set of plates shows excessive variability no one will be tempted 
to lay too much stress upon their mean; it is obvious in such cases that 
there is a large probable error, and it has been seen (Section 8), that 
there will usually be also a considerable systematic error in such cases. 
A set of plates with abnormally low variance on the other hand, may 
appear to be particularly good data, although, as we have just seen, this 
type of abnormality is also indicative of large systematic errors. It is 
therefore of practical importance that such departures from the Poisson 
distribution should be detected, whenever they occur. Since subnormal 





4.350 


350 Method of estimating Bacterial Densitg 

variation cannot be detected with certainty in a small set of plates, we 
recommend that occasional sets of 10 or 20 plates should be prepared 
from time to time, and that if necessary every batch of medium prepared 
should be tested in this way, the colonies being counted after seven days. 

10. The X® Index of Variability applied to other 
Bacterial Count Data 

It has been shown by the use of the index of variability, that the 
great bulk of Cutler’s data on soil bacteria appears to be true samples 
from the Poisson Series, and that therefore the accuracy of these results 
is known with precision; also that, by the same method, a small 
proportion of exceptions may be detected in which some definite dis¬ 
turbing cause has interfered wnth the accuracy of the results. It is 
therefore desirable to apply the same test to other sufficiently extensive 
bodies of material, in order to ascertain if, by other methods, a similar 
degree of accuracy can be obtained, and failing that, if further light can 
be thrown on the problems of the dilution method. Data from four 
sources have been examined in this way. 

(A) Buddin’s counts of soil bacteria at Rothamsted, using a gelatine 
medium. 

(B) Counts of soil bacteria published by Engberding ( 1909 ( 12 )). 

(C) Breed and Stocking’s tests of the accuracy of counting B. coli 
in milk (1920(13)). 

(D) W. Owen’s bacterial counts in sugar refinery products (1914 (14)). 

In the aggregate we have tested over 1000 sets of parallel plates; 

owing to the bulk of the total examined it is possible that a small 
proportion of arithmetical errors has been included, although the 
application of the method is much more expeditious than that of the 
preliminary investigation of Cutler’s data. Only the obvious and 
unquestionable features of each body of data will be dealt with. 

(A) Buddin’s data 

A very large number of bacterial counts were made at Rothamsted 
by W. Buddin, to whom we are indebted for permission to make use of 
these data. The actual plate counts, though not published, formed the 
basis of bacterial number estimations used in Buddin’s work on the 
effect of antiseptics on soil(i5). 

The platings in this work were made on a nutrient gelatine having 
the following composition:—Wittes peptone 40 grams, Lemco 20 
grams, NaCl 20 grams, gelatine 480 grams, distilled water 4000 c.c. 



4.351 


R. A. Fishbb, H. G. Thornton, and W. A. Mackenzie 361 

The counts therefore supply an example of the degree of accuracy 
obtained with a gelatine medium, where a considerable source of variance 
is produced by the occurrence of liquefying organisms on the plates. 

From the mass of data available, 100 sets of triplicate platings were 
extracted. The expected and observed values of x* in series are 
shown in Table XVIII. 

Table XVIII 


X* 

Expected 

Observed 

Difference 

■6 

39*3 

25*5 


13*8 

1*5 

23*9 

26 

4- 

2*1 

2*6 

14*5 

12 

— 

2*5 

3-5 

8*8 

10*5 

-f 

1*7 

4*5 

5*3 

6 

+ 

•7 

5*5 

3*2 

1 4 

4* 

•8 

6*5 

2*0 

3 

4- 

1*0 

7*5 

1*2 

4 

4- 

2*8 

over 8 

1*8 

9 

4- 

7*2 

Mean 

2 0 

3*04 



There is a marked deficiency below 1, and an increasing excess above 3. 
No distinct class of exceptionally high values can be detected, only three 
values exceed 10, and none exceed 15. The causes of additional varia¬ 
bility probably affect all observations in some degree, and are therefore 
systematic rather than sporadic. The mean variance is about 50 per cent, 
in excess of that due to random sampling. As in Cutler’s 3-plate data 
the departure from expectation is best shown by dividing the distribution 
at the quintiles as in Table XIX. 


Table XIX 


X* = 17-4 P = -0017 


X* 

Expected 

m 

Observed 
m + X 

X* 

0 

*4464 

1*0126 

1*8326 

3*2190 

20 

12 

64 

20 

15 

25 

20 

23 

0 

20 

15 

25 

20 

35 

225 

Total 

100 

100 

348 





4.352 


352 Method of estimating Bacterial Density 

Such a departure from expectation would occur by chance but once 
in 600 tedts; it is therefore clearly significant. The technique used here 
did not therefore give results of such accuracy that the variance between 
parallel plates could approximate to the Poisson Series. 

(B) The data of Engherding[12) 

The parallel platings given by this author were made to test various 
points connected with the plate method of counting soil bacteria. Some 
of the sets of platings were made on a variety of gelatine and agar media, 
as a test of these. The majority, however, were poured on an agar medium, 
containing “Nahrstoff-Heyden,” that was considered by the author to 
be the best of the media tested. 

Engherding gives 24 sets of plates; of these, 14 are of six plates each, 
six of five plates, three of four plates and one of nine plates. Nearly all 
the sets show excessive variability; only three values out of the 24 are 
below the expected average for the corresponding number of plates. The 
total of the 24 values is 5’36 times the expected total. No further test 
is necessary; random sampling must be regarded as one of the smaller 
causes of variation in these data. 

(C) The data of Breed and Stocking[Vi) 

We next come to a very thorough attempt made by Breed and 
Stocking to test and improve the methods used in the bacterial analysis 
of milk. The medium used in the platings here considered had the 
following composition:—“Difco^* peptone 1 per cent., lactose 1 per cent., 
“Lemco” *3 per cent., air dried agar 1*5 per cent. A single batch of 
medium was used throughout each experiment, so that ability to re¬ 
produce the medium, is not here tested. Parallel samples of normal milk, 
and of milk inoculated with B. coliy were analysed by different analysts 
and at different stations. Two series of these records have been examined 
by comparing the different plates of each separate analysis. Each series 
yielded 132 sets of three numbers, the duplicate counts of the same set of 
plates being reckoned as two. If the duplicate counts had closely agreed, 
this would tend to give us a bad fit between observation and expectation, 
to the extent of doubling x*- Though the agreement is not sufficiently 
great to have this effect, the tendency is to be borne in mind. 

The expected and observed distributions are shown in Table XX. 

As with Buddin’s data, though to a less extent, there is a small 
systematic excess of the larger values; the mean variance in series B 
is about 30 per cent, in excess of expectation, while in series C it is only 



4.353 

R. A. Fishkk, H. (jt. Thornton, and W. A. Mackenzie 353 

about 20 per cent. Series B also shows certain other irregularities and 
possibly the occurrence of sporadic causes of variation. Series C, which 
represents the final perfection of the technique employed, shows no 
excessively variable sets of plates. 


Table XX 



Expected 

Series “B” 

Scries “C” 

•5 

51*9 

46 

43 

1-5 

31*5 

35-5 

30 

2-5 

191 

14 

24 

3-5 

lie 

6*5 

12 

4-5 

70 

10 

10 

6*6 

4*3 

3 

4 

6-5 

26 

5 

4 

70 

1*6 

2 

2 

over 8 

2*4 

8 

5 

Moan 

200 

' 1 

205 

2*45 


It is, we believe, possible to indicate the cause of the small systematic 
excess of variance in this exceptionally fine body of data. As has been 
observed, the duplicate counts, which are recorded in full, do not agree 
very closely, and it is possible that what may be called “ error of counting ” 
is responsible for the existing discrepancy. If we consider such a typical 
pair of duplicate counts such as that shown in Table XXI, we may regard 


Table XXI 


Plate 

First count 

Second count 

Difference 

Departure 
from mean 

1 

70 

08 

+ 2 

+ 8 

2 

01 

72 

- 11 

- 5 

3 

54 

03 

- 9 

- 3 

Mean 



- 0 



the mean difierence, as due to the personal equation of the analyst; and 
the departures from the mean as made up of the several errors of 
counting*’ of the set. If the standard “error of counting*’ is o-, then the 
mean value of the sum of the squares of the three departures will be 4a*. 
In this way the standard “error of counting” was estimated for each of 
the main groups of observations in Series C, divided according to the 
mean number of colonies per plate, and the additional variance ascrib- 
able to “errors of counting” expressed as a percentage of the expected 
variance. 





4.354 


354 Method of eetimating Bacterial Demity 

Table XXII 

Percentage variance dite to '‘^errors of counting"^ 

Colonies per plate about ... 36 62 82 161 364 All 

Increased variance per cent. ... 16 % 24 % 13 % 17 % 59 % 22 % 

The effect is thus seen to be a fairly uniform one, though distinctly 
more prominent among the more crowded plates, of which eight pairs of 
triplets were available. The higher value in the second group is perhaps 
due to the fact that these contain the counts of the mixed bacterial 
population in normal milk, while the others are counts of a practically 
pure culture of B. coli. 

The effect ascribable to “errors of counting” is thus of just the right 
magnitude to explain the additional variance observed in Series C. Since 
all the groups are affected similarly and nearly to an equal extent, we may 
anticipate that if this explanation is correct, the actual values of Series C 
will fit the theoretical expectation if a uniform allowance of 20 per cent, 
is made for the additional cause of variation. The distributions are so 
compared in equal intervals of v* in Table XXIII, and by sextiles in 
Table XXIV. 


Table XXIII 


Table XXIV 


x» 

Expectation 
with 20 
allowance 

Observed 

•6 

51-9 

47-5 

1-8 

31-5 

35-5 

30 

19-1 

21 

4-2 

11-6 

12 

5-4 

7-0 

5 

6-6 

4-3 

5 

7-8 

2-6 

3 

9-0 

1-6 

— 

over 9 

2-4 

3 


X- = 7-545, P - -185 (P^ -584) 


X" 

Expectation 
m with 20 % 
allowance 

Observed 
m -f X 


0 

22 

14 

64 

•4378 

•9732 

22 

28 

36 

22 

24 

4 

1-6634 

22 

21 

1 

2-6366 

22 

28 

36 

4-3003 

22 

17 

25 

Total 

132 

132 

166 


The distribution shown in Table XXIII shows a remarkably close 
agreement with expectation. A more exact test of agreement is afforded 
by the division at the sextiles (Table XXIV); the actual figures show 
but a moderately good fit with x* = 7-545, and P = -185; since however 





4.355 


R. A. Fishkk, H. G. Thornton, and W. A. Mackenzie 355 

duplicate counts of the same plates have been taken as independent 
observations, ^ has been increased by this cause to some extent short 
of doubling, so that we may say that in reality lies between 3*77 and 
7*54, while P lies between *584 and *185; neither value could be taken as 
indicating a significant departure from expectation. 

We believe, therefore, that in this material, at all events in Series C, 
the somewhat severe conditions under which the Poisson Series is 
produced, were in reality fulfilled, and that the departure of the observa 
tions from expectation could have been eliminated had precautions been 
taken to secure a sufficiently accurate counting of the colonies. It must 
however be borne in mind that the material employed consisted in 
nearly all cases of almost pure cultures of B. colt in milk. The case cannot 
therefore be compared closely to the different problem of counting such 
a mixed bacterial flora as occurs in soil, where many different types of 
organisms, whose growth may be mutually harmful, occur on the plates. 

The interference on the plates between dissimilar organisms cannot 
here be seen, neither can the capability of the medium to check this 
interference be studied. In this material, for example, there would be 
little danger of frequent interference by spreading” organisms, whose 
growth, had they occurred, would probably have been stimulated by 
such a medium as was used, containing peptone and meat extract. 

The lemened accuracy in counting a mixed flora on this medium is 
illustrated in Table XXII, where the second group of platings, which 
contains counts of uninoculated milk, shows a noticeably higher variance 
in counting than the adjoining groups made from milk cultures of B, coli. 

The data show, however, that when such a simplified flora is studied, 
an agreement between parallel platings comparable with the expectations 
of random sampling can be obtained. 

(D) The data of W. OtceniW 

One of the most remarkable bodies of data which we have examined 
is that provided by W. Owen in his investigation of various culture 
media for the counting of micro-organisms in cane sugar products. In 
this work, a variety of different media were employed, varying in 
composition, reaction and osmotic pressure. These were tested in counting 
bacteria from a variety of sugar refinery products. From the variety of 
media employed, and from the fact that most of them were new and of 
untested value, it was to be expected that a rather high variance between 
parallel platings would be found over the whole series taken together. 
Had this been the case, separate tests would have been needed of the 



4.356 


356 Method of estimating Bacterial Density 

indices of variance on the separate media. In fact, however, no such 
remarkably high variance was found. 

The analyses were performed with sets of six plates, and we have 
chosen the first 10() of these sets for examination. The expected and 
observed numbers are shown in Table XXV. 


Table XXV 


X* 

Expected 

Observed 

Exi>ectcd 43% 

•5 

3*7 

38 

1*0 

1-5 

113 

15 

4*9 

2-5 

14 9 

0 

0*5 

3*5 

15*0 

9*5 

0*5 

4*5 

13*4 

0 

5*8 

5*5 

11*0 

3*5 

4*8 

«*5 

«*5 

3 

3*7 

7*5 

0*4 

3 

2*8 

«5 

9*5 

4*7 

3*4 

! 1 

3 5 

lO'i 

2*4 

— 1 


11*5 

1*7 

1 


12*5 

1*1 

1 


13-5 

•8 

n i 

3*4 

14*51 


1 


1.5*5 > 

10 

1 


over 10 ) 


10 / 



The excess of highly variable sets occasions no surprise; we have met 
with this feature in about the same proportion in Cutler’s data. What 
is astonishing in this case is the immense excess of sets less variable, 
and in the majority of cases much less variable, than would be the case 
under undisturbed conditions of random sampling. 

In the fourth column w’e have shown the expected distribution fitted 
to the total number in the range from 2 to 14. This seems to agree with 
the distribution observed within this range. We are unwilling to lay 
much stress on this explanation since the agreement is based on only 
36 observations. If it were accepted it would imply that the conditions 
which lead to the Poisson Series were really operative in about 44 per 
cent, of the cases, that in at least 10 and probably 11 per cent, excessive 
variability has been produced, and in the remaining 45 per cent, the 
variability has been abnormally depressed. 

The extent to which the differences between the counts of parallel 
plates is diminished seems to put the phenomenon beyond the reach of 
the ordinary explanations; there are some indications, for example, that 
the plates have not been in all cases completely counted, but it is 




4.357 


R A. Fishbr, H. 6. Thornton, and W. A. Mackenzie 357 

difficult to imagine that this cause could be responsible for any such 
bias as is observed, in view of the fact that a probable error is calculated 
separately from each set. Severe competition between colonies on the 
plate is admittedly a possible cause of diminished variability, but we 
cannot imagine it acting with such severity as would be necessary to 
explain these results, especially as in the 38 cases in which x* is less than 
one, the mean number of colonies per plate is always less than 100, and 
in 15 cases is less than 10. 

In more than one instance all the six plates have an equal number of 
colonies; in samples from a Poisson Series, this would occur but very 
rarely. For 13 colonies on each plate for example, as is recorded in one 
instance, the most favourable assumptions will only allow such a coin- 
cidence once in some 25,0(K) trials. Since in the majority of these counts 
we clearly are not dealing with undisturbed conditions of random 
sampling, the point cannot be pressed further. We do not agree, however, 
with the statement that, when such a coincidence occurs, the probable 
error is zero. 

In reviewing the foregoing data, it seems probable that the action of 
liquefying bacteria, and the development of rapidly growing organisms, 
unchecked by the medium employed, were the main causes of excessive 
variance between parallel platings in the work of Buddin and Engberding 
respectively. 

It appears, however, that the conditions of accuracy, such that the 
development of colonies on parallel platings will form a Poisson Series, 
can be fulfilled in dealing with a simplified bacterial flora (Breed and 
Stocking), and have been approached in dealing with the mixed micro¬ 
flora of soil, where the medium used has been so devised as to check 
the excessive development of spreading organisms, as in the case of 
Thornton's medium. It is possible that these conditions of accuracy 
would be fulfilled with greater certainty in the case of a mixed micro-flora, 
if the medium could be further improved so that it checked the growth of 
such harmful organisms as that found in the Leeds soil (p. 345). 

Conclusions 

(1) Under ideal conditions the bacterial counts on parallel plates will 
vary in the same manner as samples from a Poisson Series. When these 
conditions are fulfilled the mean count of a number of plates is a direct 
measure of the density of the bacterial population considered (though 
not, of course, of the total bacterial flora); and the accuracy of such an 
estimate is known with precision. 



4.358 


358 Method of estimating Bacterial Density 

(2) For any considerable body of records of sets of parallel plates, 
agreement 'with this theoretical distribution may be tested by means of 
the index of dispersion 

where x is the mean, and x any individual number of colonies counted 
on a plate (see Section 5). 

(3) From an examination of several large bodies of data we conclude 
that accurate conformity with the theoretical distribution, though rare, 
is not unattainable. In particular with a carefully improved technique, 
and a relatively simple bacterial flora, we believe that the conditions have 
probably been fulfilled by Breed and Stocking; secondly, by the aid of a 
specially adapted medium Cutler and Thornton have shown that these 
conditions have been accurately reproduced, in the great majority of 
cases, even with the mixed bacterial flora of the soil. 

(4) Any significant departure from the itieoretical distribution is a sign 
that the tnean may he wholly unreliable. 

(5) Excessive variance may be produced by the occurrence of certain 
soil organisms, which have been isolated, and which exert a toxic influence 
on other forms, and in one case disturb the counts by multiple colony 
formation. 

(6) Subnormal variance is in our experience indicative of some defect 
in the composition of the medium. 

REFERENCES 

(1) Poisson (1837). Re^herches sur la probahilite, dejt Jugemenls. Paris. 

(2) “Student*’ (1907). On the error of counting with haemocytometer. Biometrika, 

V. 361. 

(3) Fisher, R, A. (1921). On the mathematical foundations of theoretical statistics. 

Phil. Trans. A. ccxxir. 309. 

(4) Elderton, P. (1902). Tables for testing goodness of fit. Biometrika, i. 155. 

(5) Pearson, K. (1914). Tables for statisticians and biometricians. Camb. Univ. 

Press. 

(6) Fisher, R. A. (1922). On the interpretation of from contingency tables, 

and the calculation of P. J.R.S.S. lxxxv. 87. 

(7) Soper, H. £. (1914). Tables of Poisson’s exponential binomial limit. Bio- 

ntetrika, x. 25. 

(8) Whitaker, L. (1914). On the Poisson law of small numbers. Biometrika, x. 

36. 

(9) Bateman, H. (1910). Note on the probability distribution of a particles. Phil. 

Mag. Series vr. vol. xx. p. 704. 

(10) Bortkibwicz, L. von (1898). J)as Oesetz der kleinen Zahlen. lA^ipzig. 



4.359 

U. A. Fisubr, H. O. Thornton, and W. A. Maokbnzib 359 

(11) Thobmton, H. G. (1922). On the development of a standardised agar medium 

for counting soil bacteria, with especial regard to the repression of spreading 
colonies. AnnctU of Applied Biology^ ix. 

( 12) Enubbbdino, D. (1909). Veigleichende Untersuchungen iiber die Bakterienzahl 

im Ackerboden in ihrer Abhftngigkeit von ausseron Einflussen. CeniralbUUl 
fUr Bakteriologie, Abt. u. Bd. 23, p. 569. 

(13) Bbbsd, R. S. and Stocking, W. A. (1920). The accuracy of bacterial counts 

from milk samples. New York Agr. Exp. Station. Technical Bulletin, 
No. 75. 

(14) OwBN, W. (1914). Investigation of the comparative values of various culture 

media for the quantitative determination of microorganisms in cane sugar 
products. Ceniralblait fur Bakteriologie, Abt. il. Bd. 42, p. 335. 

(15) Buodin, W. (1914). Partial sterilisation of soils by volatile and non-volatile 

antiseptics. Journal of Agricultural Science, vi. 417. 

(16) Wyant, Z. N. (1921). A comparison of the technic recommended by various 

authors for quantitative bacteriological anal3r8is of soil. Soil Science, xi. 295. 

(17) CuTLBR, D. W., Crump, L. M., and Sandon, H. (1022). A quantitative investiga¬ 

tion of the bacterial and protozoan population of the soil, with an account 
of the protozoan fauna. Phil. Trana. Roy. Soc. B. 



6.86a 


5 

ON THE INTERPRETATION OF CHI SQUARE 
FROM CONTINGENCY TABLES, AND THE 
CALCULATION OF P 


AUTHOR’S NOTE 

This short paper, with all its juvenile inadeciuacies, yet did something 
to break the ice. Any reader who feels exasperated by its tentative 
and piecemeal character should remember that it had to find its way 
to publication past critics who, in the first place, could not believe 
that Pearson’s work stood in need of correction, and who, if this had 
to be admitted, were sure that they themselves had corrected it. 

The writer’s point of view was that certain inconsistencies were 
manifest in Pearson’s uses of the test, and that certain other writ¬ 
ers, notably Yule and Greenwood (1915) and Bowley (1920), had 
felt that something was wrong, without discovering what it was. 

In the writer’s thought, though not very explicitly in this paper, 
the mathematical distribution given by tables of x^ was that of the 
sum of n squares of variates normally and independently distributed 
about zero with unit variance, x^ in fact was the stjuare of the dis¬ 
tance of a random point from the centre of a homogeneous normal 
distribution in n dimensions. The number of dimensions, however, 
would be reduced by unity for every restriction upon deviations be¬ 
tween expectation and observation, and it appeared that the incon¬ 
sistencies in the literature could be straightened out if account were 
taken of the true number of degrees of freedom in which observation 
and expectation might in reality differ. 

In the examples chosen the complete equivalence of different ra¬ 
tional approaches, once the question of degrees of freedom is rectified, 
was still somewhat obscured by minor inconsistencies in the method 
of computation. These arise from the fact that the mathematical 
distribution is only realised in the limit for large samples, where 
non-linear restrictions tend to linearity. 

* Reprinted from Journal of the Royal Stalialical Society^ Vol. IjXXXV, Pt. I, 
pp. 87-94, 1922. 



6.86b 


The treatment, in a footnote, of the multinomial distribution as a 
section of a multiple Poisson distribution of independent variates 
shows the most direct demonstration of the mathematical distribu¬ 
tion tabled, as the sampling distribution of defined as a measure 
of discrepancy between observation and hypothesis. This analyti¬ 
cal artifice has, I hope, a lasting value. 

The algebraic re-examination of Pearson’s (1900) proof that the 
fitting of constants did noi affect the distribution of x^ was set out 
in 1924 (Paper 8), in which the effects of inefficient fitting also are 
worked out. 



6,87 


On the Interpretation of x* from Contingency Tables, 
AND THE Calculation op P. 

By R. A. Fisher, M.A., Rotbainsted Experiment Station. 

It is well known that the Pearsonian test of goodness of fit depends 
upon the calculation of the quantity x* so defined that if m is the 
number of observations expected in any cell, and m x the number 
observed, then 



the summation being extended to all the cells. 

Pearson has shown (1) that when the deviations are distributed 
with the sole restriction that their sum shall be zero, the distribution of 
X* is given by the Pearsonian curve of Type III, 
df oc e-^^'dx, 
where n' is the number of cells. 

We are not here concerned to criticise the general adequacy of 
the X* test, which is certainly valid if the number of observations in 
each cell is large, but to emphasize the importance of the limitation 
italicized above. For the x* test has been applied by Pearson and 
others to contingency tables, in which the sum of the deviations in 
any row or column is necessarily zero. 



6.88 

88 


Miscellanea. 


In these cases we shall show that Elderton's Tables of Goodness 
of Fit (2) may still be applied, but that the value of n' with which 
the table should be entered is not now equal to the number of cells, 
but to one more than the number of degrees of freedom in the distri¬ 
bution. Thus for a contingency table of r rows and c columns we 
should take n' = (c — 1) (/* — 1) 1 instead of n* = cr. This 

modification often makes a very great difference to the probability 
(P) that a given value of x* should have been obtained by chance. 

The most general way of proving this result consists in regarding 
the values of x (above) as independent co-ordinates in generalised 
space ; then owing to the linear relations by which the deviations 
are restricted, for example that the marginal totals of the popula¬ 
tion should be equal to those observed, all possible sets of observa¬ 
tions will lie relative to the centre of the distribution, specified by 
the assumed population, in a plane space, of the same number of 
dimensions as there are degrees of freedom. The frequency density 
at any point in this space is proportional to 

when the sample is sufficiently great for the distribution of ac to be 
regarded as normal, and where cri, <r,, . . . represent the standard 
deviations of Xi, . . . . To determine what values have to be 
assigned to the <r’s when the x*e are entirely independent, we must 
take account of the variation in the total number, 

S (m + a?) = N. 

Since the different values of x are independent, 

= S (a-*). 

The variation of x may be regarded as due to .two independent 
causes, namely the variation of N, and the variation of the propor¬ 
tion, which falls into any one compartment; we have therefore the 
series of equations, 

== Pig, N -f- pi* «rN* 

==' p,ga M 4- 

and so on, where p, is the chance of any observation falling in the 
ceU (,). 

Summing these, we find 

= N S (P9) 4- S (P*), 
whence, since S (p) ~1 and p—p*=p^f, 

= N. 




6.89 


On the Irdeffnetalitm of yf. 
Substituting in (1), 

^1* = (?>i7i + V\*) N == J^iN = wi* 
<r^ 4- N — j,,N = fw, 

and so on. 

Whence 




and the frequency density at any point in the generalised space is 

The surfaces of equal density are therefore the series of similar 
and coaxial ellipsoids, x “ constant; and since x measures the 
linear dimensions of the corresponding ellipsoid, which by a homo¬ 
geneous strain jmsses into a sphere, and since the plane space in 

• It is worth noting that the exact form of the distribution of N observatioi^ 
a number of cells is given by the multinomial exi)an8ion, 

^ ■ ■)*• 

<*f which tlh^ocfliciont of 

kf' kf- . . 

is the chaiK’e of th^particulnr distribution. 

This mav bo regardeJlsas a plane section of a distrih«ffion in which Tj, ar.-, 
are itidependently distribute according to the Pois^ series, 

, "i. (i.’".. • 

for in this case. N -- S (x), will l)e dlHril^ed according to the series. 


given dww: 

VI \ ( \ 

L”v,V '.“Vi/ 


an<l the chance of a given dijKfribution, aubj> 
will bo 


to the restriction N 


which, since S {m\ 


N, reduces to 


!x,!- • * • U’/i Vn/ 


the Rono/al term of the multinomial expansion. 

TjZgenoral however, in which the value, of * may be .mall mb 

eajZ beyond the ranfso in whieh x’ »»y ^ eonsidered a auKcient teat . 
odness of tit* 





S.SQa 


*lt is worth noting that the exact form of the distribution of N observations 
into a number of cells is given by the multinomial expansion 




M 




M 


of which the coefficient of 


is the chance of the particular distribution 
ni» • • •» 

This may be regarded as a plane section of a distribution in which ai, 
are independenUy distributed according to the Poisson series 




for all a, while the total, 


2(a) - AT, 


will be distributed in the Poisson series. 


in which 


* ITl* 

2 (m) - M. 


Hence the chance of the given series of observed frequencies, subject to the 
restriction that the total number shall be AT, will be 


Nl /^\«* /ms\a, 

01102! • • • 0,1 VM/ \M/ ’' ’ \m) ' 


the general term of the multinomial expansion. 

This general case, however, in which the values of o may be small integers, 
extends beyond the range in which x* may be considered an adequate test of 
goodness of fit. 

Atiihor*9 Addendum^ 1947. As the sample is increased without limit, so are 
all the expectations m, and each Poisson distribution tends to normality with 
mean m and variance m. Hence 


E 


(o — m)^ 
m 


is distributed in the limit as is the sum of the squares of s normal deviates dis¬ 
tributed independently each with unit variance. The section of this distribu¬ 
tion made by imposing the condition Af<»Arisof«-'l dimensions, and further 
linear restrictions will each reduce the number of dimensions by unity. 



90 


MUcdlanea, 


which the observations lie passes through the point, x ^ 
total frequency in the range of dx must be proportional to 
^n'-2 e-^K*dxy 

where n' is one more than the number of the degrees of freedom. 
Ex, 1.—In the fourfold table. 


a 

h 

<1-1-6 

c 

d 

c -f" d 

a c 

i 

n -j~ 6 -f" c “1“ d 


when the marginal totals are fixed, there remains only one degree 
of freedom. Consequently we must take n' equal to 2 and not to 4. 
We are thus led to perceive that x is distributed so that. 

This fact resolves a difficulty which has been felt with respect 
to the fourfold table. In 1915 Greenwood and Yule (3), using four 
fold tables to test the effect of inoculation against typhoid and 
cholera, follow Pearson in applying Elderton’s table with n' » 4. 
They notice, however, that if we calculate the yroyoriion attacked 
among the inoculated and among the uninoculated, thus 
a f c 

^“<r+6’ ^ 

then the difference p' — p, compared to its probable error, should 
also give a test of independence ; they find, in practice, that 
deviations which judged by the x* ^®8t are not improbable, seem 
much less likely to occur when judged by the proportions attacked. 
While pointing out the difficulty, these authors judge it safer to 
apply the x* test. 

When we recognise that wo should take n' = 2, the difficulty 
disappears, for the standard error of p is 


(a 4- c) (6 -h d) 






6.91 


91 


On the InUrpreUUion of x*. 


so that if 

, he — ad 

X — p ^ (a -f 6) (c -h d) 

then 

= (6c — ady (« 4- & 4- c 4- rf) ^ , 

<rx* (a + 6) (c -h d) {a 4 . c) (6 + d) 

and x> lor n' =s 2, is, as we have shown above, distributed over the 
positive half of a normal curve, with unit standard deviation. 

The two tests there fore, in reality identical when the test is 
riy^htl j y' applied . 

Dr. Bowley (6, 1921) has avoided this inconsistency by dis¬ 
tinguishing the use of x* in contingency tables from its use in testing 
goodness of At. For the fourfold table he shows that if 

X = od — be 

a 4“ h 4~ c 4“ d 

then also 


_ a 

—; “ X » 

and consequently, x being normally distributed, he uses the table 
of the probability integral. T hus three different tests of significan t 
J association in the fourfold table all lead to the same value of P , 
yi and this is what we should expect, since there is but one degree of 
I freedom in the fourfold table, when the marginal totals are fixed. 

It should be pointed out that certain of Pearson’s “ Tables* 
for Statisticians and Biometricians,” namely, Tables XVII, XIX 
and XX, together with XXII (Abae to determine Tp), are -all 
calculated on the assumption that n' — 4 in fourfold tables, and 
consequently should not be used when, as is almost always the 
case, the marginal proportions are obtained from the data.* 


* I am indebted to Dr. Greenwood for pointing out to me that Peanon (6) 
has recognised that in some cases the value of n', with which Elderton's tables 
should be entered, ought to bo reduced when linear restrictions are placed upon 
the observations. It would appear, however, that Pearson at that date drew a 
distinction between *' linear relations imposed on the cell contents '* and the 
restrictions which are introduced by our methods of reconstructing the hypo¬ 
thetical population from which the sample is regarded as drawn. Thus we find 
in Section 1 (p. 145) the introductory explanation, ** Actually we find in the 

** cample M the number mui>... 4 r, and the proUem arises whether the system 
** represented by niuv ... ^ is so improbable that in the selected population M 
** the oharacteristios A, B, G . . . L, cannot be considered independent, 

** ».e. M is really not a random sample of the supposed population N. Clearly 
** the answer to this problem has alrea4y been given. We have to find the 
“ value of X* ” (ttated in full notatum for I variates), “ and apply the tables 



92 


Miscellanea, 


6.92 


Ex, 2.—A further verification is possible in the case of the table 
with two rows and s columns. 


/l 

/* 


/. 

N 


/l' 

1 



// 

N' 


** for * goodness of fit.* Of course, in many cases the sampled population is 
** not known, and accordingly we can only put for ** the marginal totals ** the 
** values given by the sample itself, and test from this substitution the degree 
“ of divergence from independence.*’ 

From this passage, and from the fact that throughout the paper no correc¬ 
tion is suggested of the methods previously employed, and embodied in the 
Tables for SiatisticiaM published only the year before, it is clear that Pearson 
did not recognise t hat in all cases linear restriotions imposed upon the fro - 
q uencies of the sampled populatiouj by our methods of reconstructin g that 
population , h ave exactly the same effect upo n the d istrib utions of v * as^Kye 
r ^trlot ion s pl Ao ed upon the cell co nte n ts o f th e sam ple. 

That the true distribution of x* for the fourfold table was not recognised 
at this date may be inferred also from the fact that the criterion for differential 
death-rates, obtained as an approximation by very indirect methods, and 
applied correctly in a subsequent paper (7), namely ;— 


Q« 


»S . 


«t+<<0( I - 


the summation being taken over all age groups, when a, a', d, d' on the numbers 
exposed to risk and the numbers dying, in the two districts, follows at once 
from the fourfold table :— 

District District 

A. B. 


Total. 


Surviving 

Dying 

Exposed to risk 
for which 


a — d 

a' — d' 

a -{- a' — d — d' 

d 

d' 

d d' 

a 

a’ 

o -f- a' 




We thus obtain independent values of x from the several age groups, and 
since x for a fourfold table is normally distributed, the distribution of 

Q* -S(x*) 

for u age groups must be exactly given by that of x* in Ellderton’s tables, 
when n' = -H 1. 







5.93 


On the Interpretation of x** 


93 


Treating thia as a contingency table. 


X--S 


f/ — /±/_, n\ / r — 

V N+N' ; . V n+n' y 


f + f N / + /* 

N + N' N + N' 

the summation taken over all the columns. 
Simplifying, we obtain, 




while the number of degrees of freedom is s — 1, so that we must 
enter Elderton’s tables with n' — s. 

Now Pearson (t) has developed a special test to be applied when 
we wish to know if two independent distributions are likely to be 
random samples from the same population. He arrives at the 
value of X* obtained above by reducing the table to a simple series 
of 8 cells ; so that this special method is in reality exactly the same 
as the direct application of x* to the table, save that we take n' equal 
to*s, and not to 2 s. This latter discrepancy is not, however, dis¬ 
cussed in (4), or in the later paper (6), a nd the correct ap plication 
of x*J'<^ contingency tab les of two or more variates has never b“een 
made clear. 


Summary. 

The X* test may be applied to contingency tables, provided we 
take not the number of cells but one irwre than the number of degrees 
of freedom for n'. 

So modified, the x* test includes as special cases— . 

(i) the comparison of ratios in the fourfold table ; 

(ii) Pearson’s method of comparison of distributed samples ; 

(iii) Pearson and Tocher’s criterion of differential death rates. 

The proof which we have given of the distributi on of x* is applic- 

ablcjjnot only to contingency tables, but to all cases in whichTbo 
fr eque ncies o bser^ d are connoted with those'expTOtednSy a num ber 
o f Tinea Y ~r^tions, beyo^d'~tSe1r~reroictidh~Tb~TEe"^me tot al fre¬ 
quency. In taking thTgoodifess^^fiTbra^equMicy curve fitted by 
means of four moments, the number of degrees of freedom has been 
reduced by 4, and since the four moments are linear functions of the 
class frequencies, we should take n* to be 4 less than the number 
of cells. In this case it should be noted that it is usual, and con¬ 
venient, to calculate the moments from a finer graduation than that 
which wc use in testing goodness of fit, and in consequence* 





6.94 


94 Miscellanea. 

rcHtricted plane region in which the observations lie will not pass 
exactly through the point = 0; the distribution of Xf calculated 
from 4 less than the number of cells, will none the less be closely 
accurate even in these cases, and far more accurate than that 
obtained by putting n* equal to the number of cells. 

In all cases, therefore, of applying the x* test, it is necessary to 
take account of the number of degrees of freedom of the observations 
in relation to the expected distribution, to which they arc compared ; 
in cases where all the restrictions arc of a linear character the correct 
tlistribution of x may be found from Elderton*s tables, or, if n' == 2, 
from a table of the probability integral, while in the case of restrict 
lions of a non-linear character, Eldcrton’s tables are no longer 
exactly applicable. 


(1) K. Pearaun (1900). On tho criterion that a given syatem of doviationa 
from the probable in the caoe of a ooirelatod system of variables is such that 
it can be reasonably supposed to have arisen from random sampling. Phil., 
Mag., 1, 157. 

(2) K. Pearson (1914). Tables for Statisticians and Biometricians. Cam¬ 
bridge University Press. 

(3) G. U. Yule and M. Greenwood (1915). Tho statistics of antityphoid 
and antichulera inoculations, and the interpretation of such statistics in 
general. Proc. Hoy. Soc. of Medicine, Section of Epidemiology and State 
Medicine, viii, 113. 

(4) K. Pearson (1911). On tho probability that two independent distri¬ 
butions are really samples of the same population. Biometrika, viii, 250* 

(5) A. L. Bowley (1920). Elements of Statistics, p. 371. London: P. 8. 
King &: Sons. 

(6) K. Pearson (1915). On the theories of multiple and partial contin¬ 
gency. Biometrika, xi, 145. 

(7) K. Pearson and J. F. Tocher (1915). On criteria for the existence of 
differential death-rates. Biom^rika, xi, 159. 



6.S96a 


6 

THE GOODNESS OF FIT OF REGRESSION FOR¬ 
MULAE AND THE DISTRIBUTION OF 
REGRESSION COEFFICIENTS 


ATJTHOR’S NOTE 

Paper 5 had shown how x* could be used correctly to test the good¬ 
ness of fit of frequencies. It was natural- to follow it by an investiga¬ 
tion of the goodness of fit of regression lines. This is a more difficult 
problem, and a more maturely written paper. It is shown t hat the 
X ^ distribution supplies only an approximation , the true distribution 
being that later known as z or in the analysis of variance. Before 
its general applicability was recognised the z distribution kept turn¬ 
ing up unexpectedly. Its relationships and uses were first summarised 
(Paper 8) in 1924. It is treated here as a modified x^- It is shown 
that the method may be extended to non-linear regression, and en¬ 
ables a correct interpretation to be put on the “correlation ratio.” 

Section 6 takes up a second topic, connected with the first only by 
arising also in regression data. It is shown that the significance of 
the coefficients of regression formulae, linear or non-linear, simple or 
multiple, may be treated exactly by “Student’s” f-test. 


* Reprinted from J<ntmdL of the Royal Statistical Societyt Vol. LXXXV, Pt. IV, 
pp. 697-612, 1922. 



6.597 


The Goodness op Fit of Regression Formula, and the 
Distribution op Regression Coefficients. 

By R. A. Fisher, M.A. 

IntrodticHon. 

The widespread desire to introduce into statistical methods some 
degree of critical exactitude has led to the employment, now 
general in careful work, of the two types of quantity which charaC' 
terize modern statistics, namely, the “ probable error ** and the 
test of ** goodness of fit.” The test of goodness of fit was devised 
by Fearson, to whose labours principally wc now owe it, that the 
test may readily b<». applied to a great variety of questions of 
frequency distribution. Tt is an essential means of justifying 
a posteriori the methods which have been employed in the reduction 
of any body of data. Slutsky and Pearson have extended the 
test to apply also to the fitness of regression formuloe, Pearson’s 
correlation ratio having also been employed for this purpose. 

It has been shown in a previous communication [2 Fisher, 1922] 
that the x*^ of goodness of fit can be accurately applied only 
if allowance is made for the numb«»T of constants fitted in recon¬ 
structing the theoretical population. This correction is particularly 
important in contingency tables, but is necessary in all cases ; and 
the fact that it has not been recognizeni has led to the adoption 
of erroneous values in almost all the cases in which tests of goodness 
of fit have been employed. The values of P have been exaggerated, 
and it is to be feared that in many cases wrong conclusions have 
been drawn from' the values of P obtained. 

It has, therefore, been necessary to extend the examination to 
the tests of goodness of fit of regression lines. The errors duo to 
neglecting the number of constants fitted are here very pronounced^;^ 
but in addition other points have to be taken into consideration, 
which did not arise in our previous investigation. In the most 
important class of cases the curve of distribution of x* is now no 
longer of the Pearsonian Type III, which is the basis of Elderton’s 
tables, but of the neighbouring Type VI. Certain misconceptions 
also exist as to the form of the distribution of the correlation 



6.598 

598 


AfiaceKoneo. 


ratio, which we hope to have cleared up. We have also taken 
the opportunity of solving the outstanding problem of the distri¬ 
bution of the regression coefficients in small samples. 

1 . The accurate appUcaiion of Elderton's tables* 

With any two variables x and y we shall suppose that the 
number of observations for which x Xp is and the number 
of these for which y — yq is Upq ; also that f/p is the mean of the 
observed values of y for a given value of x, so that 

np Op = Bp{np,^yq)* 

We may regard the group Up as a random sample from a popu¬ 
lation in which the value of x is constant ; but the value of y varies 
freely about a certain mean, with a certain standard deviation, 
rr,. 

For such samples of np, therefore, the mean, 0p» will vary about 
the same mean m^, and since this mean of Op is independent of 
the number of the array, nip will be the mean of all values of Op 
from random samples, however the number np may vary. 

Any opinion put forward by Professor Pearson is worthy of 
respect; but it is impossible to agree with his statement [1, p. 240] 
that ** This result cannot be taken as obvious, as the size of the 
“ array in the sample varies.** The fact, however, Pearson has 
verified for large samples as far as the third order of approximation. 
The difference in principle is of some importance, since the simplicity 
of many of the results here obtained is a consequence of the fact 
that wc have not attempted to eliminate known quantities, given 
by the sample, from the distribution formulae of the statistics 
studied, but only the unknown quantities—parameters of the popu¬ 
lation from which the samxde is drawn—which have to be estimated 
somewhat inexactly from the given sample.*** 

Next, for arrays of any given size, the standard deviation of Op 
is cTj,/ »Jnp, and it will be normally distributed if the population-array 
be normal, and approximately so in most cases if np be large. 
Pearson rightly points out that the values of Op for arrays of 
different sizes will not be normally distributed, but the distribution 
will be markedly leptokurtic even for considerable arrays. This 
result follows from the fact that the distribution is a mixture of 

* Statistioa whose sampling distribution depends upon other statistics 
given by the sample cannot, in the strict sense,'fulfil the Criterion of Sufficiency. 
In certain oases evidently no statistic exists which strictly fulfils this criterion. 
In these oases statistics obtained by the Method of Maximum Likelihood appear 
to fulfil the Criterion of Sffioienoy; the extension of this criterion to finite 
samples thus takes a new importanoe. 



6.599 

The Ooodnesa of TiA of Rogresaion Romtatlca. 699 

normal distributions, having the same mean, but different standard 
deviations. This mired distribution need not concern us, however, 
for in applying tests of fitness we do not in practice ignore the sise 
of the array. The simple fact is, that, when the population arrays 
are normal, the 

— nip) 

is normally distributed about zero, with a standard deviation Op, 
and this distribution is independent of the size of the array. 

In the case when the population arrays are equally variable, 
cTp is constant [= »-], and if there are a arrays, the quantity 
S(zpa) = S{»ip(2?^ — mp)2} 

is the sum of the squares of a. independent, normally and equally 
variable quantities, and consequently, if we write 

= S(Zp2), 

will be distributed as is the ordinary measure of goodness of fit. 
In apxdying Blderton’s tables we must, of course, put equal to 
one more than the number of degrees of freedom, as I ha'^e demon¬ 
strated elsewhere [2]. If the values of >»,, were known a priori, 
we should take n.' = <* -f- 1, but for regression formulso fitted to 
the data by equations linear in yp we merely reduce the number 
of degrees of freedom by the number of constants fitted. Thus, 
if mp is a linear function of x, and a straight line is fitted, we have 
«.' = a — 1, and the value of then constitutes a test of whether 
or not m,, is in reality adequately represented by a linear function 
of X. Similarly, if a cubic polynomial in x be fitted, we have 
Ti' a — 3. 

2. The exact distributio-n of ujhe'tt r is deler'tnirie.d frota the data. 

So far the results are exact on the assumption that cr is known ; 
but as in practice <r must usually be obtained from the data, errors 
will be introduced from this source which necessarily influence 
the distribution of x^- If" true that <r may be estimated from 
the whole data, and is therefore known with accuracy of a higher 
order than the quantities which contribute to x® \ nevertheless it is 
necessary to determine what aberrations are to be expected when 
the data are not very numerous. 

From each array we can directly calculate the second moment 
Sp®, and it has been shown [3] that the second moment of a normal 
sample of 7ip is se distributed that the frequency with which it falls 
into the range dsjf' is proportional to 

1) "'z, ^ ^ . 





600 


MiaceUanea. 


the chance that all the observed values of Sp^ fall in assigned ranges 
is the product of a such quantities, for all are distributed indepen* 
dently ; consequently to find the optimum value of o-, which will 
also be the value with the least probable error, we must make this 
produce a maximum for variations of <r. 

Taking logarithms and differentiating, we have 

SL __ S (upsp^) _ S (tip-l) 

&r <r * 

whence the optimum value of tr^ is s- where 
(N— a) 8^ = SCnpSy/-*). 

We shall, therefore, suppose that cr is estimated by this method, 
and that 

X . 

we must now find the distribution of this statistic. 

The distribution of is of the same kind as those with which 
we have been concerned. For 


S = S (y—iivY, 

and may be regarded as the sum of the squares of N equally variable 
quantities, independent save for a linear restrictioiis of the form 

Sp (y) = npi/p. 

If, therefore, we specify the distribution of s- in such a way as 
to express the frequency element, df, in terms of the variate 
element within which it occurs, we shall have 

where t stands for — a). In tlic same way if t stand for xV- 

we have, if -j* 1 constants have been used in fitting, 

d/ocT*'*"*’"” e~^‘ dr ; 

and these two distributions are independent, for the one depends 
only on the deviations from the means of normal samples, and the 
other only on the means. 

The distribution of x* now be that of (N — a)v, so, 

substituting 

X*‘ 



The Goodness of Fit of Regression Formitdm, 


6.601 

601 


N-/I-2 a-p^3 _ i 

t 2 ■ T 2 g 2«r* dtdr. 


we obtain, ignoring constants, 

N —p—3- 

(x®) 2 e 2 g 2 ir»V ^-^)dtd\^. 


and so, on integrating from 0 to oo with respect to f, 


N-p- l 
2 




The variation in s®, therefore, changes the exact form of the distri¬ 
bution curve for from Typo III to Type VI. The change is, 
however, very small if N be large, for as N increases 



N-a/ 


2 


-*X* 

-> e 


and so reproduces the Type III distribution. 


3 . The nature of the afproximaiion of the Type VI curve 


_a-p-l N —p —3,. 

(N—a) ^ • o ^ e-~p- 

df ----— a? '2 


N- 


-a—2 , a — p —3 
2 ■ 2 ^ 




W-p -1 

2 

dx 


to the Type III curve 

g— y—1 

2 "2 «~y-3 

df^ -T— . * 2 e ' dx. 

a—p—3 . 

2 

When X is small, the two curves have closely similar forms, the 
latter being the distribution of x^» given by Elderton’s table, 
when n' ~ a — p. The ratio of the ordinates at the terminus of 
the curve is obtained by expanding the constant multiplier of the 
first curve in powers of N“^. It reduces to 
1 . in' - 1) (n- - 3) 

for high values of P; therefore, 1 — P, as given by Elderton’s table, 


♦ The symbol x! is used throughout this paper as equivalent to r(* + 1), 
whether x is an integer or not. 




6.602 

602 


Miacdlanea. 


may be corrected by multiplying by this factor. Near the centre 
of the curve we may observe the position of the mean and mode. 



Mean. 

Mode. 

Type in 

a — p — 1 

a — p — 3 

Type VI 

N —o 


(«-p-l)N_a~2 

(o-3»-3)jj_^^2 


The mean, therefore, is raised and the mode lowered in about the 
same proportion. For the higher values of x the curves are not 
closely similar, and since it is for these especially that the value of P 
is required, we shall obtain the necessary correction in P, as far 
as the terms in N~^. The ratio of the ordinates is 

1 + iS - 2(n' - I)* + (n' - 1) in' - 3)} ; 


but, since 


«•-! n' —3 ? ?.-» -J., 

2 * • -^! P„.(*) = f ® » e 


we have the correction 

(«'+l) P„.^4-2 (n'-l)-' P„vi+ («'-3) P» } 


w'—1 
' 4N 


{(n'-hl) P«'+4-2 (n'-l) Pn'.2+(«'-3) P,.-}, 


which, in the absence of tables of the Type VI curve, will usually 
be found adequate. 


4. The correlation ratio. 

We are now in a position to make an accurate use of the correlation 
ratio, as a test of the fitness of regression formulse. Let Y be the 
function of x used as regression formula, and let. 

NRV == S {n^ Ns,,2 = S 

where y is the mean of all the observed values of y; then it is easy 
to see that, provided Y has been fitted to the data so that 

S{n^(y,,-Yp)n 

is a minimum for proportional variations of Y—y, then 
N (1-R2) - SS (y-Y,)2} 











6.603 

603 


The Goodness of Fit of Regression Formulas, 


But the correlation ratio is given by the parallel formula, 

N(l-v-*) V - SS = (N-a)s^ ; 

hence, by subtraction. 


N0,‘’-R‘-0 V - 


In other words 


= 


N 

N —a 




- (N-a) 


t,2— R2 

1-.;* ' 


and to test the significance of — R* we enter Elderton’s table 
with 71* — a - p, wht^re p -|- 1 is the number of constants fitted to 
the regression line. Thus, for a linear regression formula. 


and 


X" = (N-a) 


1 - 

a-1. 


using, if necessary, the correction for Type VI as before. 

The exact form of the distribution of // itself would be difficult 
to obtain, but in practice ly is usually employed to test the validity 
of a linear or other regression formula. For this purpose it is not the 
distribution of 17 but of the more variable quantity (r/*—R®)/(1 — »/*) 
that is required, and the above expressions show it is approximately 
represented by a Type III curve, and that the probability of a greater 
discrepancy occurring by chance may be obtained from Elderton’s 
table. 


5. Comparison with previous formulce. 

Slutsky, in his method [4, p. 83] of treating homoscedastio 
data,, has used a process analogous to that arrived at above, but 
with four deviations :— 

(i) He averages the standard deviations of the arrays, and not 

their squares, in estimating the value of <r*. 

(ii) He divides his total by N instead of N — a. 

(iii) Ho enters Eldcrton’s table with n* — a instead of 

n* ^ a — p, 

(iv) He takes the Type III distribution to be exact. 

(i) Pearson [1, pp. 249-51] has criticized the first point, but 
his practice is not quite explicit. In his opinion evidently, if the 
surface is homoscedastio, we must take '^y*(l — but in the 

special case when the regression is also linear he replaces 1 — 77 ^ 



6.604 

604 


MisceUanea, 


by 1 — r^. The point is not one of importance, and I am not 
convinced that any material difference would be made by replacing 
1 — >/- by 1 — R- in general, when the regression is well fitted. 
There would seem to be no reason for treating linear differently 
from other regression formulso. In dealing with Slutsky’s price 
data, where the regression is doubtfuUy linear, Pearson prefers 
to use 1 — r®. 

(ii) The second point is, strictly, a matter of convenience, for 
when we know the distribution of x** calculated by one method, 
we also know its distribution in the second case. Since neither of 
these distributions is exactly the Type III tabled by Eldcrton, we 
are free to use whichever we please. The form we have chosen 
has the advantage of involving the best estimate of o-, and we have 
chosen it for this reason *, but as in the Type VI distribution errors 
of estimation are completely eliminated, this choice has only the 
force of a convention. The close agreement of the curve we have 
obtained with the corresponding Type III in the neighbourhood 
of the median is a practical advantage ; it should in any case be 
noted that the corrections which we have obtained for P refer 
only to our own form of the statistic 

Although strictly a matter of convenience, there is a real 
advantage when the matter is approached from other points of 
view, in the use of the best estimates. Thus, for example, when 
the arrays are undifferentiated, with resj^ect to the distribution of 
y, we naturally take 

N:^ S {y-.Qr 

as the best estimate of the variance of the whole of the observation ; 
and as the arrays are undifferentiated this should agree on the 
average with our estimate of the variance in each array. 


Now 


(1-^2) s (y - y)2 == ss (y-.Vp)«} ; 
whence it follows that the mean value of 1 — 7 - is 


N—a 
1 ’ 


and that of 7 ^, therefore. 


a —1 
N—1 ' 


Pearson has discussed the distribution of rj in this case [5]. 



6.605 

606 


The Goodness of Fit of Regression Formulcs. 


Observing that, even if the arrays arc wholly undifferentiated, 
V will necessarily be positive, ho points out that, in testing whether >/ 
differs significantly from zero, it is not only necessary to know the 
standard error of >/, but also the mean value about which it varies. 
The standard error of q for undifferentiated arrays he had previously 
[G1 evaluated at 1/v^N, and he then by a somewhat intricate method 
finds for the mean value of the value 

a — 1 
■“N" 


and deduces that the mean value of y will be 



the latter de<luction being clearly a slip. 

In the case under consideration we have p — 0, R = 0, the 
regression line fittetl being Y = //. Then 


(N~a) 51. 

1 —V 


will be distributed in the Type VI curve 
N-3, 


(N—a) 


dx ; 


whence substituting for x, we find that is distributed in the 
Type I curve 


N-3j 

M-a-2, «-3, 
2 ■ 2 ■ ■ 


a — :i S~a — 2 

(,/2) a 2- dq^. 


For large values of N the distribution of q does not tend to 
normality as Pearson supposed, but that of »/* tends to a Type III 
curve. For the mean values of q and //** we have 
a—2 , N—3 , 

7 -J— . 

0-3, * N-2 , 

2 2 


or, approximately. 


V = 







6.606 

eo6 

while 


MisceiUanea. 


in agreement with our previous value. 

The mean value for thus agrees sufficiently with that obtained 
by Pearson, but the accurate values for the mean and the standard 
deviation differ from his values. There is no purpose for pressing 
further a comparison on these lines, since, unless the number of 
arrays be large, the distribution of 77 is far from normal, and the 
significance of an observed value of 7 may be tested with some 
accuracy by the use of x®* 

It may be noticed that, when the number of arrays is large, 



to a first approximation, of which the second factor may usually 
be ignored. 

(iii) The third point of difference between my method and those 
of Slutsky and Pearson, whereby I have made allowance for the 
number of constants involved in fitting the regression formula, has 
been more fully explained in a recent paper [ 2 ]. 

It is there shown that if 

where is the number of observations expected, and rip the number 
observed in any cell, then the value of n' with which Elderton’s 
table should be entered is not the total number of cells, but one 
more than the number of values of 

rijf — 

which can be independently specified. That is to say, that when 
the values of np are reconstructed from the data of the sample, 
(n' — 1 ) is the number of degrees of freedom left after making 
this reconstruction. 

In the same way for regression lines 

X2 = is{«p(,y^-Y,n. 

and, if a is the number of arrays, n' — 1 = a, only if the values of 
Yp are assigned independently of the sample. If, as more usually 
is the case, the values of Yp are those of a regression formula fitted 
to the sample, the number of values of 

yp-Y, 



6.607 

The Goodness of Fit of Regression Formulm. 607 

which can be independently specified is reduced by the number of 
constants fitted. For example, if a cubic polynomial has been 
fitted, the number of degrees of freedom is (a — 4), so that 
W ^ a — 3. 


6 . The distribution of regression coefficierUs, 

Hitherto we have only considered data in which a number of 
values of y are observed corresponding in groups to identical 
values of x ; little statistical or physical data is strictly of this 
form, although the former may in favourable cases be confidently 
grouped, so as to simulate the kind of data for which the fitness 
of regression lines may be tested. The limitation of our methods 
to data of this form constitutes one of the most serious deficiencies 
in the statistical methods so far available. The position is well 
stated by Pearson [1, p. 258J :— 

“ Of course it is needful for a test of this kind that the 
“ number of measurements of A, ‘ the dependent variable,* 
** should considerably exceed the number of values of B tested. 
“ Tt would fail entirely if only one value of A were taken for 
“ each value of B, however numerous the latter might be. 
“ We must have some basis on which to determine the error 
“ made in a single determination of A. This is a point, 
** 1 think, often overlooked by the physicist. A fairly good 
“ determination—I mean a quantitation determination—of the 
“ goodness of fit of theory to observation could be made from 
“ ten series of eight observations of A corresponding to ten 
“ values of B. Hut no measure of goodness of fit could be 
“ obtained from eighty observations of A corresponding to 
eighty valuers of B, yet the latter system would probably 
“ make the greater appeal to most physicists. I do not see 
“ how quantitively to obtain any measure of the goodness of fit 
“ of theory to observation in the latter method of procedure.” 
It appears to the writer that the problem is one rather for the 
statistician than for the physicist; for, given equally variable 
arrays, and a regression line of known form, the problem is perfectly 
objective. I emphasize it here as a problem awaiting solution— 
a manageable solution of which would be of great practical utility. 
That it is an objective problem is clear from the confidence with 
which very bad fits will be rejected at sight, as also from the fact 
that rough and common-sense methods of testing have been developed 
for some purposes. [9, Fisher, 1921.] 

Although exact methods of testing the goodness of fit of regression 
lines are not available for the extended class of data, we are in a 



6.608 

608 


MisceUanea. 


position to give an exact solution of the distribution of the regression 
coefficients. This problem has bcum outstanding for many years ; 
but the need for its solution was recently brought home to the 
writer by correspondence with “ Student,” whose brilliant researches 
[7] in 1908 form the basis of the exact solution. 

For consider a simple linear regression formula 

Y == a 4- h(x — 


of which the coefficients a and b are calculated by the equations 

we note first that a and h are orthogonal functions, in that given 
the series of values of x observed, their sampling variation is 
independent. 

Now “ Student ” [7] has shown how the probable error of a 
may be calculated ; for if for a given value of x the standard 
deviation of y is then a will be normally distributed, so that 


So that if a is the population value of a, and r — 


v/ n. 


then T is normally distributed about zero with standard deviation 
unity. If <T^ is unknown, the best estimate that can be made of it 
from the sample is 


where the sum is divided by (n — 2) to allow for the two constants, 
used in fitting the regression line. Then the distribution of s® is, 

X* = (” — 2);j. 




fi-4 


'(«■ 


The distribution of the two quantities s and a are wholly independent; 
hence, following “ Student,” we find the distribution of a quantity 
completely calculable from the sample, namely. 


« = **'== ^ 



For 


The Goodness of Fit of Regression FormultF. 


6.609 

609 


df 


-.-4, af 


n—s 

=JL_L. . (xl\~ 

y/ TT n—l , \ 2 / 


-»x* (1+*^ 


'( y ) • 


2 


and integrating with respect to x* from 0 to oo, we have 
n—3, 


1 


2 


n —1 


dz 

(1 


the Type VII curve obtained by “ Student/’ with n reduced by 
unity, since we have fitted a rogr^sion line of the first degree- 
Sitnilarly, for 6, 


< r |,2 


S (x—* 


and if 


(b-fi) ^ S (x-Jiy 
v/8 (y-Yj 


we arrive at the same distribution as before, ft being the population 
value of the regression coefficient. 

The above arguin<'nt immediately extends itself to regression 
lines of any form and involving any number of coefficients. For, 
suppose the regression equation is of the form 


Y ,r a 4- hXi -b eXo 4-.^:X;„ 

where Xi, X.).X^, are orthogonal functions of x for the observed 

values, so that 

S ( X „ X ,)-^0 

—in the most important case, X,, will bn polynomial in x, of degree p, 
orthogonal to the polynomials of lower degree [9]—then, for 
example, 

S (X,r) 




S(X^-) • 


and 






6.610 

610 

Also, if 


Miscellanea, 


tl- p -i 


the distribution of s is given by 


rf/=_!_(xV 




X" = (n-p-l) ", . 

cr* 

Consequently, if 

* _ (A=-.)_v/8jXp») 
v?8(»-Y> 

the distribution of « is the Type VII curve 
n^-2^ 

df= _ 2 _-.—*_; 

2 

and in this case, when p + 1 constants have been fitted, all the 
other regression coefficients will be distributed in like manner, 
only substituting the corresponding function of for X,,. 

Tables of the Probability Integral of the above Type VII distri¬ 
bution have been prepared by “ Student ” [8], for values of n — p 
from 0 to 30. These tables are in a suitable form for testing the 
significance of an observed regression coefficient. For larger 
samples the curve will be sufficiently normal for most purposes, 
the variance of z being 


n—p—Z' 

The utility of “ Student’s ” curve for the distribution of errors 
in the mean of a sample, in terms of the standard deviation, as 
estimated from the same sample, is increased by the circumstance 
that the same distribution also gives that of differences between 
such means. Thus, if A and A* are the mean of samples of n and n', 
and we wish to test if the means are in sufficient agreement to 
warrant the belief that the samples are drawn from the same 
population, we may calculate 


_/ n 

v^S(*-«)*+S'V nH 






6.611 

611 


The Goodness of Fit of Regression Fotmulm, 


then z will be distributed so that 

n+n'—3, 

df== -K . 

\/ir n+n'—4 j ( 14 .j,S )“*5 - ' 

2 

This method of comparison may be applied directly to regression 
coefficients, when the same series of values of x is observed in each 
case. 

The above problem in which the errors of the coefficients of 
a regression of any form are considered, is in reality a special case 
of the multiple regre49sion surface—special in the sense that with 
a single variable we can conveniently choose the terms of the 
regression equation, so that the several terms consist of uncorrelated 
functions. When this is not the case we have such a regression 
system as 

Y = hi xi -f- 6^X2 •+• ...... -f- hjfiSp 

when Xi, x> . Xp are p independent variables, with certain 

mutual correlations. The accuracy of the regression coefficients 
is only affected by the correlations which appear in the sample, so 
that if we construct the determinant 


S (xi^) 8 (xix,) . 8 {xixp) 

8 (xjxj) 8 {x^) .8 {x^iXp) 

8 (xixp) 8 (x,.Tp). 8 (V) 

from the values of the sample, then 

2 4r* All 

where An is the minor of 8(xi-). 

Consequently, if 

then, as before, z will be distributed in the Type VII distribution 
1 2 




df- 


dz 


2 


n-p 


Conclusions. 

(1) In testing the fitness of regression lines account must be 
taken of the number of degrees of freedom which have been 
absoibed in the process of fitting. 










6.612 

612 


Mi9cdlanea. 


(2) The T 3 rpe III distribution of Elderton’s tables is not exact 
for testing Ve^eMidiTlines, but the tables may be used as a basis 
of a useful approximation. 

(3) The exact distribution of ^ given by a curve of the 
Pearsonian Type VI, which for large samples approaches the Type III 
d[isFfn>uti6n.~ 

(4) For undifferentiated arrays the distribution of is given by 
a curve of the Pearsonian Type I; for large samples this curve 
approaches the Type III distribution. 

(6) The distribution in random samples of a great variety of 
regression coefficients may be treated by the method introduced 
by Student ” for the distribution of the mean of a normal sample, 
and as in that case lead to a distribution curve of the Pearsonian 
Type VII, which for large samples rapidly approaches normality. 

The importance of the last result is considerable. It shows 
that a number of regression coefficients may be safely calculated 
from a sample of moderate size. Thus, in studying relations of a 
complex kind, such as occur in agricultural meteorology, it is 
useful to know that we may as accurately determine thirty 
coefficients from a sample of sixty sets of observations as we may 
calculate a single coefficient, or mean, from a sample of thirty-one 
observations. 


References, 

1 . K. Pearson (1916).—** On the Application of Goodness of Fit Tables to 
Test Regression Curves and Theoretical Curves used to describe Observational 
or Kxperimental Data.’* Biom,^ XI, 239—61. 

2. R. A. Fisher (1922).—“ On the Significance of x* from Contingency 

Tables, and on the Calculation of P.’* J.R.S.S., LXXXV, pp. 87—94. 

3. R. A. Fisher (1916).—** Frequency Distribution of the Values of the 
Correlation Coefficient in Samples from an Indefinitely Large Population.*' 
Siam., X, 507-21. 

4 . B. Slutsky (1913).—** On the Criterion of Goodness of Fit of the Regres¬ 
sion Lines, and on the best Method of fitting them to the Data." J,R.8,8,, 
LXXVII, 78-84. 

5 . K. Pearson (1911).—" On a Correction to be made to the Correlation 
Ratio.** Biom,, VIII, 264—6. 

6 . K. Pearson (1905).—" On the General Theory of Skew Correlation and 
Non-linear Regression." Drapers' Company Research Memoirs i Dulau and Co. 

7 . Student (1908).—“ The Probable Error of a Mean." Biom., VI, pp. 1-26. 

8 . Student (1917).—" Tables for Estimating the Probability that the Mean 
of a unique Sample of Observations lies between — oo and any given Distance 
of the Mean of the Population from which the Sample is drawn." Biom,, XI, 
414-17. 

9. R. A. Fisher (1921).—" An Examination of the Yield of Dressed Grain 
from Broadbalk.** Journal of AgricuUural Science, XI, 107-35. 



7.a 


7 

STATISTICAL TESTS OF AGREEMENT BE¬ 
TWEEN OBSERVATION AND HYPOTHESIS 


AUTHOR’S NOTE 

Papers 7 and 8 are attempts to reconcile, with the aid of the new con¬ 
cept of degrees of freedom, the discrepant and anomalous results ob¬ 
served by different authors, in the first case when confronted by 
data of a fourfold table, and in the second with distributions requir¬ 
ing the fitting of parameters. The types of confusion which had 
arisen are of some historical interest. 


Reprinted from Economica, No. 8, pp. 1-9, 1923. 



7.1 


Statistical Tests of Agreement 
between Observation and Hypothesis 

By R. A. Fisher. 

I—Introductory. 

In a recent number (January, 1922 ) of the Journal of the RoytUStatis- 
Heal Society (Ref. i) I put forward a proof that the distribution of 
X*, the Pearsonian test of goodness of fit, is not known merely from 
the number of frequency classes. In cases where the population, 
with which the sample is compared in calculating x* J^as been itself 
reconstructed from the sample, we must also take account of thel 
number of degrees of freedom absorbed in this process of reconstruc-l 
tion. The two cases of widest application were (i) contingency} 
tables in which the population is reconstructed by assigning to the 
margins the frequencies observed in the sample, and (ii) frequency 
curves constructed to agree with the sample in respect of one or 
more moments. The common sense of this correction lies in the 
fact that when the population with which the sample is compared 
has been artificially identified with the sample in certain respects, 
such as the marginal frec^uencies, or the moments, we shall evidently 
make an exaggerated estimate of the closeness of agreement between 
sample and population, if we regard the sample as an unselected 
sample of a population known d priori. It was possible to show 
that the distribution was in fact that which arises when from any 
population a large number of samples are taken, and only those 
samples chosen which agree with the population in (say) the marginal 
frequencies; these samples compared to the true population will 
give values of x* distributed in the same manner as in the practical 
case in which we compare any sample with a population artificially 
constructed from it. In both these cases the value of n* with which 
Elderton's Table should be entered is found by adding unity to the 
number of degrees of freedom in which the sample and the popula¬ 
tion are free to differ. 

The following scheme shows the various forms of sampling which 
have been considered in this discussion, and which have not always 



Random Sample. 

Selected Sample. 

True population ... 

A.—No correc¬ 
tion needed. 

B.—Correction 
needed. 

Reconstructed population 

C.—Correction 
needed. 









7.2 


2 


STATISTICAL TESTS OF 


been clearly distin^ished. A is the type of sampling considered by 
Professor Pearson in his memoir of looo (2); when the distribution 
of a random sample is given by the frequencies Xi. . . while the 
distribution of the true population from which the sample was drawn 
is given by w, . . . when S(2f)=S(f«), then Pearson showed that 

• C*^’) 

was distributed in the Type III distribution made available by 
Elderton's Table. The true distribution of x* for any finite value 
of S{3c) is, of course, discontinuous, but when the frequencies in all 
the cells are fairly large, the distribution of x* is nearly independent 
of S(x), and tends to that of the T5rpe III curve. The distribution 
stUl depends on the number of cells, and this number is equated to 

in entering Elderton*s Table. 

Although the validity of Pearson's method of testing goodness of 
fit (in case A) is not universally accepted, I know of no published 
criticism to which it has been exposed, and personally do not 
question its correctness when the frequencies are sufficiently large. 

In case B there is also, so far as I am aware, no disagreement. 
Pearson (3), in 1916, showed that in this case a complete correction 
was possible by entering Elderton's Table with a value of n' less than 
that of the frequency classes by the number of linear restrictions 
imposed on the sample. 

The really important case is C, in which the theoretical distribu¬ 
tion is unknown, and is reconstructed from the marginal values of 
the sample ; this case is by far the most frequent in actual practice. 
It was the contention of my paper of January, 1922, that this case 
is equivalent to B, for the same relations are established between 
the sample and the population to which it is compared, either by 
selecting such samples as agree with a known population, or by 
comparing a sample chosen at random with a population constructed 
to agree with it in the same respects. 


2 —Fourfold Tables. 

In any fourfold table in which the marginal totals agree, with those 
of the population to which it is compared in the calculation of x*» 
the differences (x — m), have the same value, positive or negative, 
in all four quadrants. The magnitude of this difference is clearly 
a measure of the departure of the sample from expectation, and & 
it is divided by its standard error of random sampling, we find the 
same value x which appears in the Pearsonian test of goodness of 
fit. From this it would appear that, with large samples, x tends 
to be normally distributed in the distribution 

I -ix* 


^ c 
V 2 TT 




i 


which is identical with the Type III distribution of Elderton's 
Table for . 



7.3 


AGREEMENT BETWEEN OBSERVATION AND HYPOTHESIS 3 

Using this fact, Bowley (5, p. 371) tests the significance of two 
contingency tables; this was so far as I know the only instance 
previous to my note of January, 1922, in which the value of x* 
been correctly used to calculate P for testing the independence of 
the variety in a fourfold table. It may easily be shown in the same 
way that H x he any measure of divergence from proportionsdity in 
the case of a fourfold table, such as the difference of the percentages, 
then 



For the difierences of percentages see (i). For other measures I 
may quote Pearson (6, p. 29): 

X = ^ - g - * 

oOr*ft 

where r**, Q, <t>, are aU measures of the departure of the observed 
table from independence, and etc., are the standard errors of 
random sampling for a population in which the variates are inde* 
pendent. Pearson, however, denies that X is normally distributed, 
on the ground that from Elderton*s Table with = 4, its distri¬ 
bution should be 

dx 

IT 

If I am right, Pearson was misled by assuming that n' = 4 gave 
the correct distribution when the marginal frequencies of the popu¬ 
lation are reconstructed from the sample, into the wholly untenable 
view that Q, <t>, and other such measures, are not normally 
distributed even in large samples, but that their distribution is that 
given above. Since each of these quantities may be defined by a 
natural convention so as to be equally frequently positive and 
negative, the latter view involves the belief that their distribution 
is bimodal, with a zero frequency at the central point of the sym¬ 
metrical distribution. 

The proof that for large samples the distribution of these quanti¬ 
ties is normal is not difficult; in the fourfold table 



I have shown (i) that _ = 




7.4 

4 STATISTICAL TESTS OF 

It is necessary now to show that the distribution of x will tend to 
normality as the sample is increased. Consideiing the sub-sample 
a 6, the distribution of a will be that of the bionomial 

hence the distribution of - tends to normality as ^ -f- 6 is 
a -4- 6 

increased indefinitely. Similarly that of - ^ ^ tends to normality 

with the same mean, but different standard deviation. Moreover, 
from a population in which the two attributes are independent, 
these will be two independent samples; and it is well known that 
the distiibution of the difference of two independent normally 
distributed variates is itself normally distributed. Consequently, 
X tends to be normally distributed about mean at zero ; if then 

X 

X must be normally distributed with standard deviation unity, 
while for its positive value 

ir 

and not e dx- 

TT 

The difference of opinion is a simple one of fact. If I am right, 
the mean value of ;^* == i, if IVofessor Pearson is right, it is 3, 
Professor Bo\\ley, on the other hand, regards two assumptions as 
possible : In speaking of a table showing Inoculated—Not inocu¬ 
lated, Recovered—Died (4, Doubtful Case, p. 4), he writes: 

** If we applied the method of Case III (red and black cards) we 
should be assuming the total number recovered, inoculated, etc., 
were given, and ask whether, if recovery and inoculation were 
totally unconnected, so large a number would be found by 
chance in the first compartment. 

** If we applied the method of Case IV we should be assuming 
that we were examining only a sample h*om a larger universe in 
which the proportions recovered : died and inoculated : not 
inoculated were not known.” 

It is difficult to know what meaning is to be attached to this dis¬ 
tinction. Professor Bowley does not explain what difference there 
is between this case and that treated in his book (5, p. 372), where 
the divisions are Recovered—Died ; Not Vaccinated—Vaccinated, 
for which, rightly in my opinion, he takes x normally distri¬ 
buted. Nor does he explain why this case is considered doubtful, as 
against Case IV (Dull—Not Dull; With defects—Without) where 
he, wrongly in my opinion, uses the formula = 4. In aJl these 



7.5 


AGREEMENT BETWEEN OBSERVATION ANB HYPOTHESIS 5 

cases the maximal totals tn the sample are given, and in the popular 
turn are unknown, and since the latter in all these cases has been 
reconstructed from the former, the sample can only differ from the 
pq[mlation in one degree of freedom. No assumption as to what we 
know or do not know can alter the consequences of our procedure 
in calculating x** procedure is the same in all these cases.(8) 


/a 


3 —X* Function of Frequencies. 


That the distribution of -jf from random samples b determined 
solely by the procedure by which it b calculated may be emphasised 
by returning to first principles. In a fourfold table let the proba¬ 
bilities that an event shall fall into each of the four compartments 
1 ^® Pit P*» P%t P^' Then 

PxP%P%P a I* 

and if the variates are independent 

PxP^ ^PxP%» 

so that we may write 

Px = Pt^, p% = pA =Pt»pA^ 

If now a sample of n be taken, the chance that the number of 
observations in the four compartments are a, b, c, d wUl be the term 
of the multinomial expansion 

«rirlr<fr 


Since the simultaneous distribution of a, b,c,d is thus determinate 
in terms of Px, P%» p%» Px* and n, the dbtribution of any function of 
a, bt c, and d, b also determinate. Two such functions may be 
considered; in the first place let 


X* = 


a« , _p_ 

pxH pxn 




n 






thb is the value x* used when px, p%, P; Pa» are known by hypo- 
thesb and we desire to test if the observation a, 6, c, d b in accord¬ 
ance with that hypothesb. It is agreed that thb function tends to 
be dbtributed as n b increased, in a manner independent of Px, 
pA» p4 and fi, in the dbtribution given in Elderton's Table unoer 

When, however, Px, Pt, Pa, p* are not known, and it b required to 
test, not any particular hypothesb of their values, but whether the 
two variates are or are not dbtributed independently, then we 
make the substitution 


_ (« + bHa + c) 


etc.. 


and so arrive at the formula 

^ _ {ad — be)* n 

’^”(« + 6)(c + «l)(« + c)(i + 4 (B) 



6 


7.6 


STATISTICAL TESTS OP 

This is a different function of a, h, c, d from that previously used, 
and the error of using the formula = 4, in testing independence, 
consists in assuming, when a, b, c, d take all their possible values 
with frec^uencies given by the multinomial formula, that this new 
function is distributed in the same manner as the function previousify 
uFed. 

It will scarcely be disputed, after what has been already said, that 
if X is calculated by equation B, then it wiU be distributed in random 
samples in a normal curve with standard deviation unity. If neces¬ 
sary a formal and complete proof of this fact can be given. It is 
sufficient here to point out the difference between the functions 
A and B, and to emphasize the fact that the distribution found by 
Pearson for function A cannot be assumed to be correct if B is the 
function actually employed. 

The differences between the two functions A and B are, in fact, 
very great. ^4 is a function of the probabilities and 

c an only b e used if these are provided bv the nypotnesis to be 
tested; it ^ distributed m acc^dance with the fcnmiula = 4, 
and it s mean value "is 3 . Oh the"dttief ^SandT B is a function of'the 
oBsgyed fr equencies oi^v . and is used to test inde pend ence, th^is, 
when bmr hypothesis tells us no mOTe~ Ihah t hat is 

dis&ibuted In aoccffdahce with the formul a = 2, anh ns m ean 
value is i. To enter Bld^on's Thble undiCT = 4, with a ^^ue 
of ^ calculated by equation B, is not to make a test of independence, 
for such a value of x* is not in fact distributed in the distribution for 
which the table = 4 was calculated. The effect of this err(»r is 
greatly to over-estimate the agreement of the observed sample 
with expectation, and correspondingly to under-estimate the sig¬ 
nificance of discrepancies from expectation based on the hypothesis 
of independent variates. Thus for x* calculated by equation B the 
value obtained from random samples with independent variates 
will exceed 4 in only 4*55 times to 100 trials. An observed value 
X* = 4, therefore, strongly suggests that the variates are not 
independent. But if x* obtained from ^nation A, its value 
from random samples with independent variates will exceed 4 as 
many as 26*15 times to 100 tria^; no significant departure from 
expectation could be inferred from such a value ; this ^ows how 
misleading it may be to calculate x* by equation B, and at tl^ san^ 
time to assume the distribution to be that of the function given in 
equation A. 

4—Yule's Experiments. 

It is the more surprising that Bowley should have reverted to 
the Pearsonian mode of testing fourfold tables since the actual 
distribution of x* this case has been determined experimentally 
by Yule. Yule's experiment was designed to settle the question of 
the distribution of x* the fourfold table in Case C. where the 
* population is reconstructed from the sample. He also calculated 
X* from the known population and verifii^ that in this case the 











7.7 


AGREEMENT BETWEEN OBSERVATION AND HYPOTHESIS 7 

Pearsonian formula » 4 was correct. No less than 350 observa¬ 
tions were made. The distributions of the values of x* calculated 
from the reconstructed populations were as follows (7, p. 100): 





















































































8 


7.8 


STATISTICAL TESTS OF 

References 

1. R. A. Fisher (1922).—On the interpretation of ^ from con¬ 
tingency tables, and on the calculation of P.— J.R,SS., LXXXV, 
pp. 87-94. 

2. K. Pearson (1900).—On the criterion that a given ^stem of 
deviations from the probable in the case of a correlated system of 
variables is such that it can be reasonably supposed to have arisen 
from random sampling.— PhiL Mag., L, pp. 157, etc. 

3. K. Pearson (1916).—On the general theory of multiple con¬ 
tingency with special reference to partial contingency.— Biom., XI, 
pp. 145-158. 

4. A. L. Bowley and L. R. Connor (1923).—^Test of corre¬ 
spondence between statistical grouping and formulae.— Economica, 
No. 7, pp. 1-9. 

5. A. L. Bowley (1920).— Elements of Statistics. P. S. King and 
Son, London. 

6 . K. Pearson (1912). — On a novel method of regarding the 
association of two variates classed solely in alternate characters.— 
Drapers Company Research Memoirs, Biometric Series VII. 

7. G. Udny Yule (1922). —On the application of the x* method 
to association and contingency tables, with experimental illustra¬ 
tions.— J.R.S.S., LXXXV, pp. 95-104. 

Note. —Professor Pearson has since admitted (Biometrika, XIV, p. 
418) that Greenwood and Yule's tables (Inoculated—^Not inoculated. 
Attacked—Not attacked) for Typhoid and Cholera are correctly 
treated by taking n'=2. Presumably the same rule may now be 
allowed for other diseases. Professor Pearson has, however, opened 
a new and unexpected line of defence by claiming that these tables 
are not fourfold tables at all. It is difficult to be certain what dis¬ 
tinction is in view ; the only distinction mentioned is that “ they *' 
(Greenwood and Yule) “ have arbitrarily fixed by the size of their 
inoculated and uninoculated groups two of the marginal totals." To 
avoid confusion of thought three points may be noted: (i) Green¬ 
wood and Yule did not in any sense fix the numbers inoculated and 
uninoculated, but accepted all suitable data reported ; (ii) if the 
data had referred to experimental conditions in which the pro¬ 
portion of inoculated to uninoculated could be assigned at will, this 
circumstance would have made no difference to the distribution of 
X*, since the marginal proportion in the population with which the 
sample is compared is, in any case, identified with that of the sample ; 
(iii) the proportion of inoculated to uninoculated involves onl^ one 
degree of freedom ; in order to diminish the degrees of freedom from 
3 to I it would be necessary, on Professor's Pearson's argument, for 
Greenwood and Yule to fix, equally arbitrarily, the numbers attacked 
and not attacked by the epidemics. R. 

[In regard to Mr. Fisher's Cases A and B no doubt has arisen. 
" A " is Prof. Pearson's original problem, and Case I in the article 



7.9 


AGREBBfBNT BETWEEN OBSERVATION AND HYPOTHESIS 9 

in the last issue of EcqNOMiCA ** B/' where the marginal totals 
(or the moment^ are fixed and the same for all samples, is that 
treated by Mr. Wisher generally in the Statistical Journal, 1022, 




Economica. 

The whole difficulty lies in Case C, which corresponds to Cases 
11 and IV. In Mr. Yiide's experiment and in Mr. Fisher's treatment, 
the maiginal totals are not kept constant (as they are in Elements, 
pp. 37X>2), and the reconstructed population is adjusted for each 
sample (which is not done in Case 1V) ; for in each of Mr. Yule's 
350 samples the numbers occurring were compared with a popula¬ 
tion with the same marginal totals as in that sample. Mr. Fisher 
indicates, but does not give explicitly, a proof that the theoretical 
distribution of x* in such a reckoning is very close to that found by 
Mr. Yule ; perhaps he should have emphasized that this is not merely 
a corollary to his important proof of the right treatment of Case B. 

The problem is, to find the chance that so great a divergence 
from proportionality as is observed would be found in a random 
selection from an uncorrelated population. The solution given in 
Economica was that, if the proportions in the population were 
P%» P%» P€$ the result is that given above in connection with Mr. 
Fisher's formula (A). It was shown that the chance is greatest 
when p^ = {a b) (a c)/n\ etc., but it was not supposed that 
Pu Pn. . . varied firom experiment to experiment as in connection 
vdth formula (B) ; the supposition was that many samples were 
taken from the unchanged selected population. 

The dispute is not about the mathematics ; the doubt is whether 
the variation of samples supposed in Economica, or that supposed 
by Mr. Fisher and Mr. Yule, is appropriate to the problem. Prof. 
Pearson {Biometrika, XIV, p. 189) may be cited in favour of the 
Economica supposition.— ^A. L. Bowley.] 


8.441a 


8 

THE CONDITIONS UNDER WHICH CHI 
SQUARE MEASURES THE DISCREPANCY 
BETWEEN OBSERVATION AND 
HYPOl HESIS « 


See AtUhor*s Noiej Paper 7. 


* Reprinted from Journal of the Royal Stali^ical Society^ Vol. LXXXVII, Pt. 
Ill, pp. 442-450, 1924. 



8.442 


The Conoitions under which x* Measures the Discrepancy 
BETWEEN Observation and Hypothesis. 

By R. A. Fisher", M.A. 

1. Introductory» 

The interesting series of experiments on the distribution of 3^*, 
reported by Dr. Brownlee [ 1 ] affords an opportunity, not only of 
clearing up such doubts as still remain as to the necessity of entering 
Elderton’s table with a corrected, or reduced, value of n', but also 
of bringing the conditions under which affords a measure of 
goodness of fit into relation with the general theory of statistical 
estimation. 

If X is the frequency of observations in any compartment of a 
frequency distribution, and if m is the expectation in that com¬ 
partment, Pearson introduced ([2] 1900 ) the statistic 

as a measure of the discrepancy between observation and expectation. 
He succeeded in calculating the distribution of 3^^, when the values 
of X were the frequencies in random samples from an infinite popu¬ 
lation in which the frequencies were proportional to^ m, and showed 
that the distribution of 3^* depends, in the limit when the samples are 
large, only on the number of classes, n% into which the samples were 
divided. In the same paper Pearson considered the possibility that 
when the values of m are not a priori expectations, but are themselves 
calculated from the observed values, the distribution of might 
be modified by this procedure. He concluded that this was not so, 
and applied the test without correction to several examples in which 
the expectations in the several classes had been calculated from the 
distribution in the sample. 

In 1922 [ 3 ] I was able to show, in the case of contingency tables , 
for which the margins of the expected table are reconstructed from 
those of the observed table, that the distribution of 3^* was given 
exactly by Pearson’s formula if we take for n', not the number of 
classes in the table, but one more than the number of degrees of 
freedom in which the expected table might differ frotn the values 
observed. The number of degrees of freedom is the number of 
frequencies which may be given arbitrary values without conflicting 
with the condition that the marginal totals are already specified. 
Thus, for a contingency table with two variates having r rows and 
c columns, n' should be equated, not to cr, but to 1 + (c — 1) (f — !)• 



8.443 

Discrepancy hettoeen OhaervcUion and Hypothesis. 443 

In the same paper I expressed the opinion that the same reasoning 
should be applied to testing the goodness of fit of frequency curves* 
but that some discrepancy would arise if the grouping used in 
calculating the theoretical distribution were different from that 
employed in testing the goodness of fit. 

Dr. Brownlee, after verifying the accuracy of the distribution 
with corrected n', in several instances, considers a coin-tossing 
experiment, in which he has obtained 32 samples in each of which 
236 observations are distributed in the five classes, 4 heads, 3 heads, 
2 heads, i head, no head. He finds that when the observations are 
compared to the theoretical distribution, given by the expansion of 

(4 ■+■ 4 )^ 

the values of obtained agree well with expectation for 4 degrees of 
freedom (n' = 5) ; also that when compared to the theoretical 
distribution 

(p + g)*. ?> + g = 1 , 

where p is obtained from the observations, by making a minimum, 
the observed values of agree with expectation for 3 degrees of 
freedom (n' ~ 4) ; but when the comparison is made with areas 
of a normal curve calculated by moments, using Sheppard’s correc¬ 
tion, in which calculation 2 degrees of freedom, representing the mean 
and standard deviation, are involved, the values of do not at all 
conform to exjjectation for 2 degrees of freedom (w' =3), but are 
distinctly higher. In fact, 5 out of the 32 values of y^ observed 
exceed 6 , for which, when n' — 3, P is about 0*05 ; so that in 5 
individual samples we should be led to conclude that the observation 
significantly contradicted the hypothesis, and in the aggregate of 
32 samples contradicted it conclusively. 

2. Reasons for ojbnormal distribution of yf. 

This example illustrates so well the different reasons for which 
yf may be abnormally distributed that these reasons may bo 
considered in turn, yf will be abnormally distributed— 

(A.) If the hypothesis tested is not in fact true. 

The distribution in the population from which Brownlee’s 
samples were drawn apjH?ars to have been in the ratio 1, 4, 6 , 4, 1. 
This ratio is not reproducible by dissecting a normal curve at equal 
intervals of the abscissa. In terms of the standard deviation the 
distance from the mean of the limits of the central group would be 
from the Kelley-Wood table [4], 0*488777 ; the next limits, repre 
senting the points beyond which the tail of the curve is one-sixteenth 
of the total area, would be at -|- 1 *534121 ; while to include the 



8.444 

444 MisodUtnea. 

whole area the next limits are at infinity. For the hypothesis to 
be true, these values should bo in the ratio 1 : 3 : 5 . If we suppose 
the whole tail to be included in the extreme classes, we must still 
recognize that the central class, including three-eighths of a normal 
curve, stands on a smaller length of abscissa than the adjoining 
areas each including one-quarter of the curve. If, therefore, the 
values of are found to be excessive, they are only performing 
their prime function in indicating the inexactitude of the hypothesis 
tested. In fact, with increasing samples, the values of in such 
a case should increase without limit, and cannot be expected to 
be distributed as in Elderton’s table. 

Even if a hypothesis be true, the value of obtained will 
not measure the goodness of fit, if the method of fitting employed 
is inadequate ; for in such a case the hypothesis to be tested is 
not adequately represented by the series of ** expected ** frequencies 
obtained. 

In the first place the distribution of will be abnormal — 

(B.) If the method of estimation employed is Inconsistent. 

A method of fitting fails to fulfil the criterion of consistency if, 
when applied to an infinite sample, i.e. to the whole population, 
it fails to reproduce the exact form of the population. Let us 
suppose that the frequencies in the five classes of the population 
were proportional to the areas of a normal curve divided at ± 0 > 5 , 
db 1 *5 ; the fractions in the five classes would then be— 

0 * 0668072 , 0 * 2417303 , 0 * 3829250 , 0 * 2417303 , 0*0668072 *, 

from these the second moment, using Sheppard’s correction, is 
0 * 934585 , whereas the true standard deviation is unity, equal to 
the grouping unit. Thus, using an indefinitely large sample, our 
method of estimation introduces an error of about 3 per cent, into 
our estimate of the standard deviation. Consequently from this 
cause also the values of -/f obtained will increase indefinitely as 
the size of the sample is increased. 

(C.) If the method of estimation employed is IneffideTtt. 

In any problem of estimation innumerable statistics, all func¬ 
tions of the observations, may be devised for the estimation of the 
required parameter, such that in all cases the error tends to zero 
as the size of the sample is increased. Such statistics all satisfy 
the criterion of consistency, and may all be termed consistent ; 
for large samples the sampling distribution of each of them may 
tend to normality, with variance inversely proportional to the 
number in the sample from which it was calculated, but the variance 



8.445 

Discrepancy between Observation and Hypothesis, 445 

of different statistics derived from samples of the same size will 
generally be different. We are thus led to specify out of the mass 
of consistent statistics, a group characterized by the fact that as 
the sample is increased their distribution curves tend to normality 
with the least possible variance. Such statistics satisfy the criterion 
of efficiency, and may be called efficient statistics. The efficiency 
of any other statistic is defined so as to be inversely prox>ortional 
to its variance in large samples, the efficiency of efficient statistics 
being loo per cent. For example, it may be proved that in 
estimating the mean of a normal distribution, no statistic can be more 
efficient than the mean of the sample, and that this has a variance 
of a®/w, where n is the number in the sample. The variance 
of the median obtained from a large sample tends to the value 
7 ta*/ 2 n ; consequently, while the efficiency of the moan is loo 
per cent., that of the median is only 63*66 j>er cent. 

3 . Properties of efficient statistics. 

I have shown elsewhere 15 ] that a statistic satisfying the criterion 
of efficiency may be found by the Method of Maximum Likelihood, 
and that its variance in random samples may be calculated directly 
from the frequency distribution of the population. If m is the 
expected frequency in any class, and x is the frequency observed, 
then any parameter 0, of which the series of values of m are functions, 
may be estimated by maximizing 

L — S (a? log m) 

for variations of 0 ; this leads to an equation of the form 



from which, in any special case, 0 may be obtained. The variance in 
random samples of the value so obtained is given by 

or, since S(m) is independent of 0 , by 



Before connecting these properties with the distribution of 5^*, 
we may prove two elementary propositions respecting statistics 
which satisfy the criterion of efficiency. 

1 . The correlation between any two estimates of the same para* 
meter which satisfy the criterion of efficiency tends to -{-1, as the 
sample is increased indefinitely. 



8.446 

446 


Miscellanea, 


Let the variance of each estimate tend to o^/n as the sample is 
increased, and let the correlation between the two estimates be r. 
Then the variance of their mean will be 

n 2 

But, by hypothesis, this cannot be less than or*/n, therefore r cannot 
tend to a value less than unity. 

2. The correlation between any estimate which satisfies the 
criterion of efficiency, and any other consistent estimate of the same 
paramc'ter, tends for increasingly large samples to a limit, r, given by 

r= ^/E, 

where E is the efficiency of the second statistic. 

Let A be the efficient statistic with variance and B the 

inefficient statistic with variance fj^lKn ; from them compound a 
new statistic (7, such that 

(1 + E — 2r^~B) C = (1 — rv/E) A + (E — rv/E) B ; 
then the variance of C is 

a* 1 — r® _ cT* 1 — r® 

^ * r 4- E — 2r VF n * 1 — jKf ' 

if therefore r does not tend to ^ET as the samples are increased the 
variance of C will tend to be less in the limit than the variance of 
Ay which is impossible. Therefore, in the limit r — ^E. 

An easy corollary is that the correlation of A with {B — A) is 
zero, so that the deviations of B from the population value may be 
regarded as made up of two parts : one, an error of random sampling, 
properly so called, is the deviation of A from the population value ; 
the other, distributed independently of the first, is the error of 
estimation by which the inferior estimate, B, differs from the superior 
estimate. A, 


4. The minimum of 

All statistics which satisfy the criterion of efficiency being 
equivalent for large samples, it is important in connection with the 
yf test that the method of minimizing yf is one of them. For 


and if this is a minimum for variation of a parameter 0, wo find 




8.447 

Discrepancy between Observation and Hypothesis, 447 


Now, for large samples this equation tends to equivalence with 
that obtained by the Method of Maximum Likelihood, for the latter 
may be written 

/x — m €)m\ .. 

and, for large samples, the factor, 

X m 
‘tn 

tends in all classes to the constant value, 2. Hence, all methods of 
fitting involving only efficient statistics tend, for large samples, to 
minimize 

5. The. effect on y'-* of substituting for the true value of any parameter 
an estimate of it derived from sample. 

Let m stand for the frequency in any class expected from the 
true value of the parameter, and m* the corresponding frequency 
calculated from an efficient estimate. 

1.4et 

and 

tlien 



The difference of the reciprocals of m and m' will depend on the 
difference SO between the true valuta of the parameter and its value 
derived from the sample. Since S0 decreases proportionately to 
as the size of the sampl€3 is increased, we shall expand the above 
expression in powers of SO, noticing that since both x and m increase 
l)roiX)rtionately to ?i, we shall have to carry the expansion as far 
as the term in (SO)^, while in that term factors which tend to unity 
with increasing sample, such as x/m', may be omitted. Now 

i - -V =- J- so + /A (M*. 

m 7H tn'2 do \d0 / ni'^ dd^J 2 

but, since has been made a minimum. 





8.448 

448 


MisoeUanea* 


Henco 


x*-x'* 


(Sfl)* Q / 2 /Sm'\» o*m' \ 

“"T • Vwj “ -sef / 


Moreover, for any efficient statistic. 



consequently the amount, by which x® is reduced, is the square 
of a quantity normally distributed with unit standard deviation. 
The substitution thus diminishes the average value of x® ^>7 unity, 
and this alone shows that if x® is still distributed in the type III 
distribution given by £lderton*s table, the value of n' with which the 
table is entered must be reduced by unity. It is, however, apparent, 
since 0 has been found by a process equivalent to making x'® ^ 
minimum, that x^® is distributed independently of the additional 
square, (80)*/o*, and since x® is distributed as is the sum of the 
squares of a number of quantities distributed normally and inde> 
pendently each with unit standard deviation, it is necessary that x'^ 
should be distributed as in the sum of the squares of a number smaller 
by unity of such quantities, and consequently the type III distri¬ 
bution is always reproduced. 

If, however, Xi® is the value obtained by using an inefficient 
statistic of efficiency then we find as above 


^ 2 _ _ (S®)* 

Xl X —-j-. 


where a® is the variance of an efficient statistic, and 80 is the error 
of estimation by which the inefficient statistic differs from the 
efficient one. The mean value of (80)® is 


a® 



consequently the mean value of Xi® niay be found from that of x® 
by subtracting 2 ~ In this case, however, the distribution is 

J!i 

not the Type III characteristic of x®- It will be noticed that with 
efficiencies below 50 per cent., the mean value of x® less than that 
of Xi®» so that the reconstructed population is generally less like 
the sample than is the population from which the sample was drawn. 

The efiect of using statistics, therefore, which are inconsistent 
is to make the value of x® increase indefinitely with the size of the 



8.449 

Discre'pancy between Ohaervation and Hypothesis. 449 

sample ; consistent statistics which are somewhat inefficient dis¬ 
turb the distribution by altering the mean, and in other ways. 
When such are used the value of obtained does not measure 
merely the deviations of observation from hypothesis, but includes 
also deviations due to errors in the estimation of the parameters. 
Consequently such values should not be entered in Elderton’s table in 
testing goodness of fit, but, if such tests are intended, the small cor¬ 
rections should be applied by which efficient statistics may be 
obtained. 

The cases in which Dr. Brownlee’s experiments have verified the 
theoretical distribution of have all been obtained, actually or by 
approximation, by making a minimum. The theoretical distri¬ 
bution would equally have appeared if any other efficient method 
had been used. For example, the five frequencies, a, p, y, 8, e, 
might have been fitted with the binomial distribution 

256 {(i + 7|) + (i - 7))}« 

by taking 

10247J = 2 (a -* e) -f (P - S). 

It is not necessary for our purpose to push refinement in methods 
of fitting beyond f ho. requirement that all statistics used should be 
fully efficient, for the distribution is in any case only exact when 
the sample is increased without limit, and in these circumstances all 
efficient statistics are equivalent. Only in an enquiry into the 
accuracy of the distribution for small samples would such further 
refinements be required, and it is by no means obvious with small 
samples (i) that the method of minimizing y^ possesses any advantage 
over other efficient methods, or (ii) that the form of y^, without 
modification, provides the ideal measure of discrepancy for small 
samples. The method of maximum likelihood, for example [5], 
minimizes the quantity 

1j (x log £ ^ 

r=8 -fi ~ J ( g - m)» j (x - 

L ^ rn m* wi® 

of which ^y^ is the limit as the samples arc indefinitely increased. 



References. 

1. J. Brownlee (1924).—“ Some Experiments to Test the Theory 
of Goodness of Fit.” J.R.S.S., Vol. LXXXVII, pp. 76-82. 



Miaeeittanea. 


8.450 

450 

2. K. Pearson (1900).—“ On the Criterion that a given System, 
of Deviations from the Probable in the case of a Correlated System of 
Variables is such that it can be reasonably supposed to have arisen 
from random sampling.” Phil, Mag,^ Series 5, Vol. L, pp. 157-175. 

3. R. A. Fisher (1922).—“ On the Interpretation of x* from 
Contingency Tables, and the calculation of P.” JM.S.S,, Vol. 
LXXXV, pp. 87-94. 

4. T. L. Kelley (1923).— “ Statistical Method,” 390 pp. 
Macmillan Co., New York. 

5. R. A. Fisher (1921).—” On the Mathematical Foundations 
of Theoretical Statistics.” Phil. Trans., A, Vol. 222, pp. 309-368. 



9.94a 


ON A PROPERTY CONNECTING THE CHI- 
SQUARE MEASURE OF DISCREPANCY WITH 
THE METHOD OF MAXIMUM LIKELIHOOD 


AUTHOR^S NOTE 

A more general and systematic exposition than had previously been 
attempted of the connection between the measure of discrepancy 
between fitted expectations and observational frequencies, and the 
method of fitting by maximum likelihood. 


* Reprinted from AUi del Congreaao JrUemazumale dei Mathematici^ Bologna, 
Vol. VI, pp. 96-100, 1928. 



9.95 


R. A. Fisher (Rothamsted-Herts - Inghilterra) 


ON A PROPERTY CONNECTING THE MEASURE 
OF DISCREPANCY WITH THE METHOD OF MAXIMUM LIKELIHOOD 


1 . • Introductory. — The measure of discrepancy, x^% between observation 
and hypothesis conforms to its well known series of distributions only in the 
limit when the number of observations tends to infinity; and in the theory of 
finite samples it is not obvious that this measure has any unique merit. In the 
theory of large samples it has been shown (*) that the test of Goodness of Fit 
based upon it is valid only if all statistics used in estimating adjustable para¬ 
meters satisfy the criterion of efficiency, and further that all efficient statistics 
tend in large samples to equivalence. Recently, however (*), in the detailed in¬ 
vestigation of a simple sampling problem arising in genetics the writer found 
that X* sp^i^Uy relate to a particular type of efficient solution, namely to 
that obtained~biy the method of maximum likelihood. In view of the theoretical 
and practical importance of solutions obtained by this method in the exact theory 
of finite samples, it is of interest to trace how general the observed relationship 
may be, and to ascertain its bearing upon the interpretation of x* derived from 
finite samples. 

2. - A particular example. — In the particular case examined four types of 
offspring may occur, the expectations from a sample of n being 

?(2 + 0,l-O, 1-0,0) 

in which 0 is an adjustable parameter depending on the linoag e> between the two 
. genetic factors concerned; if 

are the numbers observed in the four classes, it is evident that the two expressions 

+®t—+ 

y—Oi + tta—3(0, + 04 ) 

(*) R. A. Fishbr (1024): The eondiHona under whieh x' meaaurea the dieerepaney between 
obeervaHon and hypotheeie. Journal of the Royal Statistical Society, LXXXVII, pp. 442-449. 

<*) R. A. Fibhek (1928): StaHetieal Methode for Reeaar^ Workere, Edinburgh, Oliver A 
Boyd, 2 "^ Edition, XI + 260 p. 





96 


9.96 


COMUNICAZIONI 


have each an expectation zero for all values of 6, and that the random sampling 
distribution of each may be derived from the binomial expansion 



in entire independence of the value of 6. 

In the comparison of four observed frequencies with a series of expectations 
amounting to the same total, three degrees of freedom will be available for 
discrepancies; if, however, the expectations have been calculated from the obser- 
vations to which they are to be compared by the use of one adjustable parameter 
we may anticipate that the degrees of freedom left available for discrepancies 
will be reduced to two. We may identify these with x and y since these represent 
discrepancies which are not affected by modifications of the value of 6. 

For any value of 0 the value of x* ^ written 


y * _ 1 i+ !? 1 ! - n 


which, for a given value of n, is a quadratic function of the frequencies; if, for 
our two chosen components x and y we form the quadratic expression 


( 1 ) 


-1 - i a. ^ 

1—$*{«• OjO, , 


and substitute for of and the mean values of x* and y*, and for pOiOt the 
mean value of xyj we obtain 




which is also, for a given value of n, a quadratic function of the frequencies. 
On comparing the two expressions term by term it appears that 


^ ^ (2 + 6 > 1—0 ^ eshB(i — 


+ 2 d)n 


[2 + 6> 1—d ^ (9 y2©(l —0)(2-f.^)* 

The value of x*~^Q* therefore always positive, except for the special 


value of 6 for which 


2 + 0 


* h + <h I g4 
■ l—d 


0 


which is the equation for the estimate of $ provided by the method of maximum 
likelihood. For this method maximises 


whence 


L^ai log (2 + 6) + (a, + aa) log (1 -6) -f log 0 

g« + ga , ?4 
bO 2 + 0 1—0 O' 


^_ at , . fU 

bO* (2 ,H- 0)* (1 — 0)* ' 


Moreover 



9.97 


R. A. Fisher: On a property connecting the measure of discrepancy 07 

of which the mean value is ^ 2&)n 

” 2 ^( 1 - 

which, with changed sign, is the quotient in our expression for 

otherwise interpreted, the quantity of information relative to B which the data 

contain. 

The use of from finite samples is therefore exactly equivalent to the use 
of the bivariate expression (1) when the method of maximum likelihood is 
employed. The difference may be regarded as that part of the discre¬ 

pancy between observation and hypothesis which is due to imperfect methods 
in the estimation of 6, This part will be large even in large samples if inefficient 
methods are used; but with efficient methods other than the method of maxi¬ 
mum likelihood it may be expected to be sufficiently small for sufficiently 
large samples. 

8 . • The statistic defined by an equation linear in the frequencies, which 
is also efficient. — The particular instance examined is special in that the 
frequencies are expressible as linear functions of the unknown parameter. The 
connection between x* <^nd the maximum likelihood solution does not, however, 
flow from this fact, but from the fact, which is true in general, that the equa¬ 
tion of maximum likelihood is linear in the frequencies. The maximum likelihood 
equation may indeed be derived from the conditions that it shall be linear in 
the frequencies, and efficient for all values of 0, 

Consider any statistic T defined as the relevant root of an equation of the form 

in which a stands for the frequency in any class, k for a coefficient, which is to 
be a function of 0, and S for summation over all classes. Then the sampling 
variance of T from large samples may be equated to the sampling variance of X, 
for a given value of 0, divided by the square of the mean value of dXISO, 

But, if we let p denote the probability of any class, 

V{X)^nSipk^); 

and the mean value of dXjdO is given by, 

ax/S0-nS{p g. 

If now the statistic is consistent 

5(A3o)-0 

for all values of 0, and hence 



98 


COMUNIGAZIONI 


SO we may write 


V(T)^ 


S(pk^) 


nSH 


m' 


If we minimise this for variations of k it appears that 


for each class, or 


kp _ dpjdo 

s(^) ~ SikSpJdW) 


Using these fanetiono we find 

and the equation for T becomes 


1 

p\bQ) 


s 


\p be) ^ 


which is in fact the equation of maximum likelihood. 


4. • The resolution of into its components. — In general with any number 
of parameters 0i, d |Sr to be estimated, the equations of maximum likelihood 


will be 



0 


0 . 


Any linear function of the frequencies, 

Xi^S{ka), 


will have a mean value zero provided that 

S{kp)=0; 

and this mean will be stationary for variations of $i$r provided the r conditions, 

( 2 ) 

are fulfilled. If the number of classes is s it will then be possible to form s — r — 1 
quantities z fulfilling these conditions, and such that for any two of them Zc 
and Zd the condition 

( 3 ) 0 

shall also be satisfied. 




9.99 


R. A. Fisher: Ow- a property connecting the measure of discrepancy 99 


Considering now the quantities 


a — pn 
Spn 


as rectangular coordinates of a point in s dimensions. Since 
it follows that 

}/S{npk*) I K5(*®) ^ i 

the r—1 values of which represent the coordinates of the same point with 
respect to r —1 new axes, while the condition (3) shows that these new axes 
are mutually orthogonal and (2) shows that they lie in the surfaces 


and evidently also in 
But 



iS(a) = «. 




is the square of the distance of the sample point from the origin, and 



which we may now write, Q*, must represent the square of its distance from 
the generalised surface 

Xi^Xi= .... 


and these quantities will necessarily be the same if the remaining r coordinates, 
which can be built up as linear functions of 



are all zero. In fact must always be a positive quantity expressible as 

a homogeneous quadratic function of the quantities 



Q* being the sum of the squares of r—1 linear functions of the frequencies 
the mean value of each of which is stationary for variations of the parameters. 

It follows that, in the theory of large samples, we may always speak of x* 
as made up of two parts, one of which is due to errors of estimation, and vanishes 
when estimation is efficient, while the other is distributed in random samples as 
is the sum of the squares of s—r—l quantities each normally distributed about 
zero with unit standard deviation. In the theory of finite samples the former 



100 


9.100 

COMUNICAZIONI 


portion vanishes only when the adjustable parameters are estimated by the 
method of maximum likelihood, while the latter is the sum of the squares of 
quantities distributed, generally discontinuously, but each with unit standard 
deviation, and without mutual correlation for the particular set of parametric 
values arrived at by the method of maximum likelihood, and such that the 
mean of each is stationary at zero for variations of the parameters in the 
neighbourhood of these values. 



10.308a 


I O 

ON THE MATHEMATICAL FOUNDATIONS OF 
THEORETICAL STATISTICS 

AUTHOR’S NOTE 

This is the first large-scale attack on the problem of estimation. In 
the author’s opinion the frequently stated view that the concept of 
a best estimate was arbitrary and subjective ignored the guidance 
afforded by purely mathematical considerations of absolute validity. 
He had been impressed by the property of sufficiency, first found in 
1920 (Paper 2). 

This property, when it exists, picks out one particular method of 
estimation as uniquely superior to all possible alternatives. In such 
cases the sufficient estimate may be foimd by the method of maxi¬ 
mum likelihood. Consequently, one purpose of the paper is to ex¬ 
amine the properties of the likelihood function, here defined, and the 
properties of the estimates arrived at by maximising this function. 
Several fruitful ideas, such as the measurement of the amount of 
information, emerge. These were classified and developed further 
in 1925 (Paper 11). 

On these further points the present paper is only of historical in¬ 
terest, for the author was by no means clear on such points as whether 
a sufficient statistic, or something with equivalent advantages, could 
always exist, and the extension of the theory towards an exact treat¬ 
ment of small samples is consequently incomplete. He did not clearly 
see, for example, that the variance of an estimate does not, in the 
theory of small samples, supply a satisfactory basis for comparison. 
The correct treatment is, however, foreshadowed on page 10.350. 

His object was to test the value of the new ideas by applying them 
to a variety of problems. An older or more judicious writer would 
not have allowed the course of the argument to be interrupted by 
such long excursuses as that giving the exact treatment of correc¬ 
tions for grouping; or that in which the efficiency of fitting Pear- 
sonian curves by moments is examined. These were, however, 
questions for which at the time no analytically competent discussion 
existed. After all, it is a common weakness of young authors to put 
too much into their papers. 

* Reprinted from Philotophical TrantaetUmB of the Royal Society qf London^ 
Series A, Vol. 222, pp. 309-368, 1922. 



[ 309 ] 


10.309 


IX. On the Mathematical Foundations of Theoretical Statistics. 

By R. A. IjisuER, M.A., Fellow of Gonville and Cairn College, Cambridge, Chief 
Statistician, Rothamsted Experimental Station, Harpenden. 

Communicated by Dr. E. J. Russell, F.R.S. 


Contents. 

Section Page 

1. The Neglect of Theoretical Statistics.310 

2. The Purpose of Statistical Methods.311 

3. The Problems of Statistics.. 

4. Criteria of Estimation.. 

6. Examples of the Use of Criterion of Consistency.317 

6. Formal Solution of Problems of Estimation.323 

7. Satisfaction of the Criterion of Sufficiency.330 

8. The Efficiency of the Method of Momenta in Fitting Curves of the Pcarsonian Type III . . 332 

9. Location and Scaling of Frequency Curves in general.338 

10. The Efficiency of the Method of Moments in Fitting Pearsonian Curves.342 

11. The Reason for the Efficiency of the Method of Moments in a Small Region surrounding the 

Normal Curve.355 

12. Discontinuous Distributions.356 

(1) The Poisson Scries .359 

(2) Grouped Normal Data.359 

(3) Distribution of Observations in a Dilution Scries.363 

13. Summary.366 


Definitions. 

Centre of Location. —That abscissa of a frequency curve for which the sampling errors 
of optimum location are uncorrelated with those of optimum scaling. (9.) 

Consistency. — A statistic satisfies the criterion of consistency, if, when it is calculated 
from the whole population, it is equal to the required parameter. (4.) 

Distribution. —^Problems of distribution are those in which it is required to calculate 
the distribution of one, or the simultaneous distribution of a number, of functions of 
quantities distributed in a known manner. (3.) 

The efficiency of a statistic is the ratio (usually expressed as a percentage) 
which its intrinsic accuracy bears to that of the most efficient statistic possible. It 


















10.310 

310 


MR. R. A. FISHER. ON THE MATHEMATICAL 


expresses the proportion of the total available relevant information of which that 
statistic makes use. (4 and 10.) 

Efficiency {Criterion). —The criterion of efficiency is satisfied by those statistics which, 
when derived from large samples, tend to a normal distribution with the least possible 
standard deviation. (4.) 

Estimation. —Problems of estimation are those in which it is required to estimate the 
value of one or more of the population parameters from a random sample of the 
population. (3.) 

Intrinsic Accuracy. —The intrinsic accuracy of an error curve is the weight in large 
samples, divided by the number in the sample, of that statistic of location which satisfies 
the criterion of-sufficiency. (9.) 

Isostatistical Regions. —If each sample be represented in a generalized space of which 
the observations are the co-ordinates, then any region throughout which any set of 
statistics have identical values is termed an isostatistical region. 

Likelihood. —The likelihood that any parameter (or set of parameters) should have 
any assigned value (or set of values) is proportional to the probability that if this were 
so, the totality of observations should be that observed. 

Location. —The location of a frequency distribution of known form and scale is the 
process of estimation of its position, with respect to each of the several variates. (8.) 

Optimum. —The optimum value of any parameter (or set of parameters) is that value 
(or set of values) of which the likelihood is greatest. (6.) 

Scaling. —The scaling of a frequency distribution of known form is the process of 
estimation of the magnitudes of the deviations of each of the several variates. (8.) 

Specification. —Problems of specification are those in which it is required to specify 
the mathematical form of the distribution of the hyppthetical population from which 
a sample is to be regarded as drawn. (3.) 

Sufifidency. —A statistic satisfies the criterion of sufficiency when no other statistic 
which can be calculated from the same sample provides any additional information as 
to the value of the parameter to be estimated. (4.) 

Validity. —The region of validity of a statistic is the region comprised within its 
contour of zero efficiency. (10.) 

1. The Neglect of Theoretical Statistics. 

Several reasons have contributed to the prolonged neglect into which the study of 
statistics, in its theoretical aspects, has fallen. In spite of the immense amount of 
fruitful labour which has been expended in its practical applications, the basic principles 
of this organ of science are still in a state of obscurity, and it cannot be denied that, 
during the recent rapid development of practical methods, fundamental problems have 
been ignored and fundamental paradoxes left unresolved. This anomalous state of 
statistical science is strikingly exemplified by a recent paper (1) entitled The Funda- 



10.311 

FOUNDATIONS OF THEORETICAL STATISTICS. 311 

mental Problem of Practical Statistics,” in which one of the most eminent of modern 
statisticians presents what purports to be a general proof of Bayes' pastulate, a proof 
which, in the opinion of a second statistician of equal eminence, seems to rest upon a 
very peculiar—not to say hardly suppoaable—relation.” (2.) 

Leaving aside the specific question here cited, to which we shall recur, the obscurity 
which envelops the theoretical bases of statistical methods may perhaps be ascribed 
to two considerations. In the fir.st place, it appears to be widely thought, or rather 
felt, that in a subject in which all results are liable to greater or smaller errors, precise 
definition of ideas or concepts is, if not impossible, at least not a practical necessity. 
In the second place, it has happened that in statistics a purely verbal confusion has 
hindered the distinct formulation of statistical problems ; for it is customary to apply 
the same name, niean^ standard deviation, correlation coefficient, etc., both to the tnje 
value which we should like to know, but can only estimate, and to the particular value 
at which we happen to arrive by our methods of estimation ; so also in applying the 
term probable error, writers sometimes would appear to suggest that the former quantity, 
and not merely the latter, is subject to error. 

It is this last confusion, in the writer’s opinion, more than any other, which has led 
to the survival to the present day of the fundamental paradox of inverse probability, 
which like an impenetrable jungle arrests progress towards precision of statistical 
concepts. The criticisms of Boole, Venn, and Ciirystal have done something towards 
banishing the method, at least from the elementary text-books of Algebra ; but though 
we may agree wholly with Chrystal that inverse probability is a mistake (perhaps the 
only mistake to which the mathematical world has so deeply committed itself), there 
yet remains the feeling that such a mistake would not have captivated the minds of 
Laplace and Poisson if there had been nothing in it but error. 

2. The Purpose op Statistical Methods. 

In order to arrive at a distinct formulation of statistical problems, it is necessary to 
define the task which the statistician sets himself: briefly, and in its most concrete 
form, the object of statistical methods is the reduction of data. A quantity of data, 
which usually by its mere bulk is incapable of entering the mind, is to be replaced by 
relatively few quantities which shall adequately represent the whole, or which, in other 
words, shall contain as much as possible, ideally the whole, of the relevant information 
contained in the original data. 

This object is accomplished by constructing a hypothetical infinite population, of 
which the actual data are regarded as constituting a random sample. The law of distri¬ 
bution of this hypothetical population is specified by relatively few parameters, which 
are sufficient to describe it exhaustively in respect of all qualities under discussion. 
Any information given by the sample, which is of use in estimating the values of these 
parameters, is relevant information. Since the number of independent facts supplied in 



10.312 

312 


MR. R. A. FISHER ON THE MATHEMATICAL 


the data is usually far greater than the number of facts sought, much of the information 
supplied by any actual sample is irrelevant. It is the object of the statistical processes 
employed in the reduction of data to exclude this irrelevant information, and to isolate 
the whole of the relevant information contained in the data. 

When we speak of the probability of a certain object fulfilling a certain condition, we 
imagine all such objects to be divided into two classes, according as they do or do not 
fulfil the condition. This is the only characteristic in them of which we take cognisance. 
For this reason probability is the most elementary of statistical concepts. It is a para¬ 
meter which specifies a simple dichotomy in an infinite hypothetical population, and it 
represents neither more nor less than the frequency ratio which we imagine such a 
population to exhibit. For example, when we say that the probability of throwing a 
five with a die is one-sixth, we must not be taken to mean that of any six throws with 
that die one and one only will necessarily be a five ; or that of any six million 
throws, exactly one million will be fives ; but that of a hypothetical population of an 
infinite number of throws, -with the die in its original condition, exactly one-sixth will 
be fives. Our statement will not then contain any false assumption about the actual 
die, as that it will not wear out with continued use, or any notion of approximation, as 
in estimating the probability from a finite sample, although this notion may be logically 
developed once the meaning of probability is apprehended. 

* The concept of a discontinumts freqtiency distribution is merely an extension of that of 
a simple dichotomy, for though the number of classes into which the population is 
divided may be infinite, yet the frequency in each class bears a finite ratio to that of the 
whole population. In frequency curves^ however, a second infinity is introduced. No 
finite sample has a frequency curve : a finite sample may be represented by a histogram, 
or by a frequency polygon, which to the eye more and more resembles a curve, as the 
size of the sample is increased. To reach a true curve, not only would an infinite number 
of individuals have to be placed in each class, but the number of classes (arrays) into 
which the population is divided must be made infinite. Consequently, it should be 
clear that the concept of a frequency curve includes that of a hypothetical infinite 
population, distributed according to a mathematical law, represented by the curve. 
This law is specified by assigning to each element of the abscissa the corresponding 
element of probability. Thus, in the case of the normal distribution, the probability 
of an observation falling in the range dx, is 





dx, 


in which expression x is the value of the variate, while m, the mean, and a-, the standard 
deviation, are the two parameters by which the hypothetical population is specified. 
If a sample of n be taken from such a population, the data comprise n independent facts. 
The statistical process of the reduction of these data is designed to extract from them 
all relevant information respecting the values of m and «•, and to reject all other 
information as irrelevant. 



FOUNDATIONS OF THEORETICAL STATISTICS. 


10.313 

313 


It should be noted that there is no falsehood in interpreting any set of independent 
measurements as a random sample from an infinite population; for any such set of 
numbers are a random sample from the totality of numbers produced by the same 
matrix of causal conditions: the hypothetical population which we are studying is an 
aspect of the totality of the effects of these conditions, of whatever nature they may be. 
The postulate of randomness thus resolves itself into the question, “ Of what population 
is this a random sample ? ” which must frequently be asked by every practical statistician. 

It will be seen from the above examples that the process of the reduction of data is, 
even in the simplest cases, performed by interpreting the available observations as a 
sample from a hypothetical infinite population ; this is a fortiori the case when we have 
more than one variate, as when we are seeking the values of coefficients of correlation. 
There is one point, however, which may be briefly mentioned here in advance, as it 
has been the cause of some confusion. In the example of the frequency curve mentioned 
above, we took it for granted that the values of both the mean and the standard deviation 
of the population were relevant to the inquiry. This is often the case, but it sometimes 
happens that only one of these quantities, for example the standard deviation, is required 
for discussion. In the same way an infinite normal population of two correlated variates 
will usually require five parameters for its specification, the two means, the two standard 
deviations, and the correlation ; of these often only the correlation is required, or if not 
alone of interest, it is discussed without reference to the other four quantities. In such 
cases an alteration has been made in what is, and what is not, relevant, and it is not 
surprising that certain small corrections should appear, or not, according as the other 
parameters of the hypothetical surface are or are not deemed relevant. Even more 
clearly is this discrepancy shown when, as in the treatment of such fourfold tables as 
exhibit the recovery from smallpox of vaccinated and unvaccinated patients, the method 
of one school of statisticians treats the proportion of vaccinated as relevant, while 
others dismiss it as irrelevant to the inquiry. (3.) 

3. The Problems of Statistics. 

The problems which arise in reduction of data may be conveniently divided into three 
types:— 

(1) Problems of Specification. ITiese arise in the choice of the mathematical form of 
the population. 

(2) Problems of Estimation. These involve the choice of methods of calculating from 
a sample statistical derivates, or as we shall call them statistics, which are designed 
to estimate the values of the parameters of the hypothetical population. 

(3) Problems of Distribution. These include discussions of the distribution of 

statistics derived from samples, or in general any functions of quantities whose 
distribution is known. 

It will be clear that when we know (1) what parameters are required to specify the 



10.314 

314 


MR. R, A. FISHER ON THE MATHEMATICAL 


population from which the sample is drawn, (2) how best to calculate from the sample 
estimates of these parameters, and (3) the exact form of the distribution, in different 
samples, of our derived statistics, then the theoretical aspect of the treatment of any 
particular body of data has been completely elucidated. 

As regards problems of specification, these are entirely a matter for the practical 
statistician, for those cases where the qualitative nature of the hypothetical population 
is known do not involve any problems of this type. In other cases we may know by 
experience what forms are likely to be suitable, and the adequacy of our choice may 
be tested a ‘posteriori. We must confine ourselves to those forms which we know how¬ 
to handle, or for which any tables which may be necessary have been constructed. 
More or less elaborate forms will be suitable according to the volume of the data. 
Evidently these are considerations the nature of which may change grpatly during the 
work of a single generation. We may instance the development by Pearson of a very 
extensive system of skew curves, the elaboration of a method of calculating their para¬ 
meters, and the preparation of the necessary tables, a body of work which has enormously 
extended the power of modem statistical practice, and which has been, by pertinacity 
and inspiration alike, practically the w-ork of a single man. Nor is the introduction of 
. )the Pearsonian system of frequency curves the only contribution which their author has 
made to the solution of problems of specification: of even greater importance is the 
introduction of an objective criterion of goodness of fit. For empirical as the specifica¬ 
tion of the hypothetical population may be, this empiricism is cleared of its dangers if 
we can apply a rigorous and objective test of the adequacy wdth which the proposed 
population represents the whole of the available facts. Once a statistic, suitable for 
applying such a test, has been chosen, the exact form of its distribution in random 
samples must be investigated, in order that we may evaluate the probability that a 
worse fit should be obtained from a random sample of a population of the type con¬ 
sidered. The possibility of developing complete and self-contained tests of goodness of 
fit deserves very careful consideration, since therein lies our justification for the free 
use which is made of empirical frequency formulae. Problems of distribution of great 
mathematical difficulty have to be faced in this direction. 

Although problems of estimation and of distribution may be studied separately, they 
are intimately related in the development of statistical methods. Logically problems of 
distribution should have prior consideration, for the study of the random distribution of 
different suggested statistics, derived from samples of a given size, must guide us in the 
choice of which statistic it is most profitable to calculate. The fact is, however, that 
very little progress has been made in the study of the distribution of statistics derived 
from samples. In 1900 Pearson (16) gave the exact form of the distribution of x*> the 
Pearsonian test of goodness of fit, and in 1915 the same author published (18) a similar 
result of more general scope, valid when the observations are regarded as subject to 
linear constraints. By an easy adaptation (17) the tables of probability derived from 
this formula may be made available for the more numerous cases in which linear con- 



10.315 

FOUNDATIONS OF THEORETICAL STATISTICS. 315 

straints are imposed upon the hypothetical population by the means which we employ 
in its reconstruction. The distribution of the mean of samples of n from a normal 
population has long been known, but in 1908 ** Student ” (4) broke new ground by 
calculating the distribution of the ratio which the deviation of the mean from its popula> 
tion value bears to the standard deviation calculated from the sample. At the same 
time he gave the exact form of the distribution in samples of the standard deviation. 
In 1915 Fisher (5) published the curve of distribution of the correlation coefficient for 
the standard method of calculation, and in 1921 (6) he published the corresponding 
series of curves for intraclass correlations. The brevity of this list is emphasised by the 
absence of investigation of other important statistics, such as the regression coefficients, 
multiple correlations, and the correlation ratio. A formula for the probable error of any 
statistic is, of course, a practical necessity, if that statistic is to be of service: and in 
the majority of cases such formulae have been found, chiefly by the labours of Pearson 
and his school, by a first approximation, which describes the distribution with sufficient 
accuracy if the sample is sufficiently large. Problems of distribution, other than the 
distribution of statistics, used to be not uncommon as examination problems in proba¬ 
bility, and the physical importance of problems of this type may be exemplified by the 
chemical laws of nuiss action, by the statistical mechanics of Gibbs, developed by 
Jeans in its application to the theory of gases, by the electron theory of Lorentz, and 
by Planck’s development of the theory of quanta, although in all these appli¬ 
cations the methods employed have been, from the statistical point of view, relatively 
simple. 

The discussions of theoretical statistics may be regarded as alternating between 
problems of estimation and problems of distribution. Tn the first place a method of 
calculating one of the population parameters is devised from common-sense considera¬ 
tions ; we next require to know its probable error, and therefore an approximate solution 
of the distribution, in samples, of the statistic calculated. It may then become apparent 
that other statistics may be used as estimates of the same parameter. When the 
probable errors of these statistics are compared, it is usually found that, in large samples, 
one particular method of calculation gives a result less subject to random errors than 
those given by other methods of calculation. Attacking the problem more thoroughly, 
and calculating the surface of distribution of any two statistics, we may find that the 
whole of the relevant information contained in one is contained in the other: or, in 
other words, that when once we know the other, knowledge of the first gives us no 
further information as to the value of the parameter. Finally it may be possible to 
prove, as in the case of the Mean Square Error, derived from a sample of normal popula¬ 
tion (7), that a particular statistic summarises the whole of the information rdevant 
to the corresponding parameter, which the sample contains. In such a pase the problem 
of estimation is completely solved. 



10.316 

316 


MR. R. A. FISHER ON THE MATHEMATICAL 


4. Criteria of Estimation. 

The common-sense criterion employed in problems of estimation may be stated thus:— 
That when applied to the whole population the derived statistic should be equal to the 
parameter. This may be called the Criterion of Ccmsisteruyy. It is often the only test 
applied : thus, in estimating the standard deviation of a normally distributed population, 
from an ungrouped sample, either of the two statistics— 

(Mean error) 

(Mean square error) 

will lead to the correct value, o-, when calculated from the whole population. They both 
thus satisfy the criterion of consistency, and this has led many computers to use the 
first formula, although the result of the second has 14 per cent, greater weight (7), and 
the labour of increasing the number of observations by 14 per cent, can seldom be less 
than that of applying the more accurate formula. 

Consideration of the above example will suggest a second criterion, namely :—That in 
large samples, when the distributions of the statistics tend to normality, that statistic 
is to be chosen which has the least probable error. 

This may be called the Criterio^i of Efficiency, It is evident that if for large samples 
one statistic has a probable error double that of a second, while both are proportional 
to then the first method applied to a sample of 4w values will be no more accurate 
than the second applied to a sample of any n values. If the second method makes use 
of the whole of the information available, the first makes use of only one -quarter of it, 
and its efficiency may therefore be said to be 26‘per cent. To calculate the efficiency of 
any given method, we must therefore know the probable error of the statistic calculated 
by that method, and that of the most efficient statistic which could be used. The 
square of the ratio of these two quantities then measures the efficiency. 

The criterion of efficiency is still to some extent incomplete, for different 
methods of calculation may tend to agreement for large samples, and yet differ for 
all finite samples. The complete criterion suggested by our work on the mean 
square error (7) is:— 

That the statistic chosen should summarise the whole of the relevant information 
supplied by the sample. 

This may be called the Criterion of Suffiiciency. 

In mathematical language we may interpret this statement by saying that if $ be 
the parameter to be estimated, d; a statistic which contains the whole of the information 
as to the value of which the sample supplies, and any other statistic, then the 


and 

= A/ “ ^ (x—.rf 

^ n 



10.317 

FOUNDATIONS OF THEORETICAL STATISTICS. 317 

surface of distribution of pairs of values of 0, and for a given value of is such that 
for a given value of 0^, the distribution of B^ does not involve B. In other words, when 
d, is known, knowledge of the value of Bj throws no further light upon the value of B. 

It may be shown that a statistic which fulfils the criterion of sufi&ciency will also 
criterion of efficiency, when the latter is applicable. For, if this be so, the 
distribution of the statistics will in large samples be normal, the standard deviations 
being proportional to u K Let this distribution be 




1 

27r<r,flr2 s/\—r^ 


.1 

* :-'»i 


^ dBx dBa, 


then the distribution of Bi is 


df = 


, \/ 2 * 


- c dBu 


so that for a given value of 0, the distribution of B. is 


d/= 


o-a V 'Itt 


\/1 —r 




dBy, 


and if this does not involve B, we must have 


Vcr2 = <Ti ; 

showing that <ti is necessarily less than a-j, and that the efficiency of B^ is measured by 
when r is its correlation in large samples with d,. 

Besides this case we shall see that the criterion of sufficiency is also applicable to finite 
samples, and to those cases when the weight of a statistic is not proportional to the 
number of the sample from which it is calculated. 


5. Examples of the Use of the Criterion of Consistency. 

In certain cases the criterion of consistency is sufficient for the solution of problems 
of estimation. An example of this occurs when a fourfold table is interpreted as repre¬ 
senting the double dichotomy of a normal surface. In this case the dichotomic ratios 
of the two variates, together with the correlation, completely specify the four fractions 
into which the population is divided. If these are equated to the four fractions into 
which the sample is divided, the correlation is determined uniquely. 

In other cases where a small correction has to be made, the amount of the correction 
is not of sufficient importance to justify any great refinement in estimation, and it is 
sufficient to calculate the discrepancy which appears when the uncorrected method is 
applied to the whole population. Of this nature is Sheppard’s correction for grouping, 




10.318 

318 


MR. K. A. FISHER ON THE MATHEMATICAL 


and it will illustrate this use of the criterion of consistency if we derive formulss for 
this correction without approximation. 

Let ^ be the value of the variate at the mid point of any group, a the interval of 
grouping, and x the true value of the variate at any point, then the moment of an 
infinite grouped sample is 

^ = * r<+*« 

S dx, 

in which / (x) dx is the frequency, in any element dx, of the ungrouped population, and 


p being any integer. 

Evidently the moment is periodic in 0, we will therefore equate it to 
Afl + Ai sin fi+Ajsin 20... 

+ Bi cos 0+Ba cos 20— 


I* rf+l« . 

A, = l J ff{x)dx 

2ir , _ _ Jo J(~ia 


But 


therefore 


p s 00 ^ /t+la 

As = — S 8in50c?0 

Jo Jt-in 

1 ,*»r ri + iii 

Bg = — X cos50d0 ^^/(x)dx. 

^ P = -« Jo Jf-*a 

0 = ^_ 2 ^, 
a 

de=^—d(, 

a 

• • 2Tr 

sm sB = sin — 
a 

cos sQ = cos — 
a 


hence 


“ a J “ o J ^ 

aJ-» Jf-lffl aj-m Jz’-ka 



10.319 

FOUNDATIONS OF THEORETICAL STATISTICS. 319 

Inserting the values 1, 2, 3 and 4 for h, we obtain for the aperiodic terms of the^four 
moments of the grouped population 


I A,I — I xfix^dx, 

:,A„ = j ^ ~^f (it) d t. 


If we ignore the periodic terms, these equations lead to the ordinary Sheppard 
corrections for the second and fourth moment. The nature of the approximation involved 
is brought out by the periodic terms. In the absence of high contact at the ends of the 
curve, the contribution of these will, of course, include the terms given in a recent paper 
by Pearson (8) ; but even with high contact it is of interest to see for what degree of 
coarseness of grouping the periodic terms become sensible. 

Now 

1 ~ rt+in 

As = - S I sin.s‘9(i9 i^f{>t)dx, 

V , ^ Jli Jf-io 


= - [ sin d^ f ^^/{x)dx, 
aJ-x a Jf-ia 

= f f /(-^ f ^ 

<l J -00 't* 

But 

2 rrfi« 27r.s*^ a ^ttsx 

^ sin > df= -cos-cos t 

a a ITS a 

therefore 

.A» = (-)*“^f eo8^/(x)rfx; 

TTS J <l 


similarly the other terms of the different moments may be calculated. 
For a normal curve referred to the true mean 


in which 


,A, = (-r‘:^e =•'. 
.Bs = It, 


a = 27rf. 


The error of the mean is therefore 



10.320 

320 


MB. R. A. FISHER ON THE MATHEMATICAL 


To illustrate a coarse grouping, take the group interval equal to the standard deviation : 
then 

<r 

and the error is 


— - 6“^’* sin Q 


with sufficient accuracy. The standard error of the mean being—we may calculate 

V n 

the size of the sample for which the error due to the periodic terms becomes equal to 
one-tenth of the standard error, by putting 


whence 

For the second moment 
and, if we put 
there results 


10 \/w 

= 13,790 X lo'* 

1 fvi\ ’ 


\/2<r‘' , a. 

-T* = 4<r fi , 

lOv/ n 

= 175 X lo'* 


The error, while still very minute, is thus more important for the second than for 
the first moment. 

For the third moment 


putting 




L - a«- • 


\/l5 


lOy/n 




= 147 xio'» 


While for the fourth moment 


B. = (-)»■ ^{l- (’rV-3)^.-(»V-6)^.}e-^, 

SO that, if we put, 

= 32irVe-‘’^, 


lOv/w 


n = = 1*34 x 10 “ 

3200ir* 



FOUNDATIONS OF THEORETICAL STATISTICS. 


10.321 

321 


In a similar manner the exact form of Sheppard’s correction may be found for other 
curves ; for the normal curve we may say that the periodic terms are exceedingly minute 
so long as a is less than o-, though they increase very rapidly if a is increased beyond 
this point. They are of increasing importance as higher moments are used, not only 
absolutely, but relatively to the iucreasing probable errors of the higher moments. 
The principle upon which the correction is based is merely to find the error when the 
moments are calculated from an infinite grouped sample ; the corrected moment therefore 
fulfils the criterion of consistency, and so long as the correction is small no greater 
refinement is required. 

Perhaps the most extended use of the criterion of consistency has been developed by 
Pearson in the “ Method of Moments.” In this methofl, which is without question of 
great practical utility, different forms of frequencjy curves are fitted by calculating as 
many moments of the sample as there are parameters to be evaluated. The parameters 
chosen are those of an infinite populat ion of the specified type having the same moments 
as those calculated from the sample. 

The system of curves developed by Pearson lias four variable parameters, and may 
be fitted by means of the first four moments. For this purpose it is necessary to confine 
attention to curves of which the first four moments are finite ; further, if the accuracy 
of the fourth moment should increase with the size of the sample, that is, if its probable 
error should not be infinitely great, the first eight moments must be finite. This 
restriction requires that the class of distribution in which this condition is not fulfilled 
should be set aside as “ heterotypic,” and that the fourth moment should become 

practically valueless as tliis class is approached. It should be made clear, however, 

that there is nothing anomalous about these so-called “ heterotypic ” distributions 
except the fact that the method of moments cannot be applied to them. More¬ 
over, for that class of distribution to which the method can be applied, it has not 
been shown, except in the case of the normal curve, that the best values will be 

obtained by the method of moments. 'I’lie method will, in these cases, certainly be 

serviceable in yielding an approximation, but to discover whether this approximation 
is a good or a bad one, and to improve it, if necessary, a more adequate criterion is 
required. 

A single example will be sufficient to illustrate the practical difficulty alluded to 
above. If a point P lie at known (unit) distance from a straight line AB, and lines be 
drawn at random through P, then the distribution of the points of intersection with 
AB will be distributed so that the frequency in any range dx is 


df = 


1 

TT 


dx 

1 + (x—m)“’ 


in which x is the distance of the infinitesimal range dx from a fixed point 0 on the Ime, 
and m is the distance, from this point, of the foot of the perpendicular PM. The distri- 



10.322 

822 


MR. R. A. FISHER ON THE MATHEMATICAL 


bution will be a symmetrical one (Type VII.) having its centre at a? = w (fig. 1). It is 
therefore a perfectly definite problem to estimate the value of m (to find the best value of 
m) from a random sample of values of x. We have stated the problem in its simplest 
possible form: only one parameter is required, the middle point of the distribution. 



A 




dx 

l+x** 


B 





e 4 


By the method of moments, this should be given by the first moment, that is by the 
mean of the observations : such would seem to be at least a good estimate. It is, 
however, entirely valueless. The distribution of the mean of such samples is in fact the 
same, identically, as that of a single observation. In taking the mean of 100 values of 
a?, we are no nearer obtaining the value of m than if we had chosen any value of x out 
of the 100. The problem, however, is not in the least an impracticable one: clearly 
from a large sample we ought to be able to estimate the centre of the distribution with 
some precision; the mean, however, is an entirely useless statistic for the purpose. 
By taking the median of a large sample, a fair approximation is obtained, for the standard 


error of the median of a large sample of n is —which, alone, is enough to show that 

2Vn 

by adopting adequate statistical methods it must be possible to estimate the value for 
m, with increasing accuracy, as the size of the sample is increased. 

This example serves also to illustrate the practical difficulty which observers often 
find, that a few extreme observations appear to dominate the value of the mean. In 
these cases the rejection of extreme values is often advocated, and it may often happen 
that gross errors are thus rejected. As a statistical measure, however, the rejection of 
observations is too crude to be defended : and unless there are other reasons for rejec¬ 
tion than mere divergence from the majority, it would be more philosophical to accept 
these extreme values, not as gross errors, but as indications that the distribution of 
errors is not normal. As we shall show, the only Pearsonian curve for which the mean 





FOUNDATIONS OF THEORETICAL STATISTICS. 


10.323 

323 


is the best statistic for locating the curve, is the normal or gaussian curve of errors. If 
the curv e is not of this form the mean is not necessarily, as we have seen, of_§:ny value 
whateve r. The determination of the true curves of variation for different types of work 
is therefore of great practical importance, and this can only be done by different workers 
recording their data in full without rejections, however they may please to treat the 
data so recorded. Assuredly an observer need be exposed to no criticism, if after 
recording data which are not probably normal in distribution, he prefers to adopt some 
value other than the arithmetic mean. 


6. Formal Solution of Problems op Estimation. 

The form in which the criterion of sufficiency has been presented is not of direct 
assistance in the solution of problems of estimation. For it is necessary first to know 
the statistic concerned and its surface of distribution, with an infinite number of other 
statistics, before its sufficiency can be tested. For the solution of problems of 
estimation we require a method which for each particular problem will lead us 
automatically to the statistic by which the criterion of sufficiency is satisfied. Such a 
method is, I believe, provided by the Method of Maximum Likelihood, although I am 
not satisfied as to the mathematical rigour of any proof which I can put forward to 
that effect. Readers of the ensuing pages are invited to form their own opinion as 
to the possibility of the method of the maximum likelihood leading in any case to an 
insufficient statistic. For my own part I should gladly have withheld publication until 
a rigorously complete proof could have been formulated ; but the number and variety 
of the new results which the method discloses press for publication, and at the same 
time I am not insensible of the advantage which accrues to Applied Mathematics from 
the co-operation of the Pure Mathematician, and this co-operation is not infrequently 
called forth by the very imperfections of writers on Applied Mathematics. 

If in any distribution involving unknown parameters 6i, 03 > • • • > l^he chance of 

an observation falling in the range dx be represented by 

/(«, Or, ...)dx, 

then the chance that in a sample of w, fall in the range dx^, in the range dxs, and 
80 on, will be 

The method of maximum likelihood consists simply in choosing that set of values 
for the parameters which makes this quantity a maximum, and since in this expression 
the parameters are only involved in the function/, we have to make 

S(log/) 




10.324 

324 


MR. R. A. FISHER ON THE MATHEMATICAL 


a maximum for variations of 0,, Bi, But &c. In this form the method is applicable to 
the fitting of populations involving any number of variates, and equally to discontinuous 
as to continuous distributions. 

In order to make clear the distinction between this method and that of Bayes, we 
will apply it to the same type of problem as that which Bayes discussed, in the hope 
of making clear exactly of what kind is the information which a sample is capable of 
supplying. This question naturally first arose, not with respect to populations distri¬ 
buted in frequency curves and surfaces, but with respect, to a population regarded as 
divided into two classes only, in fact in problems of probahility. A certain proportion, 
Pt of an infinite population is supposed to be of a certain kind, e.g., successes,’^ the 
remainder are then " failures.” A sample of n is taken and found to contain x successes 
and y failures. The chance of obtaining such a sample is evidently 


xly 


iP'(i-p)' 


Applying the method of maximum likelihood, we have 

s (log/) = X log P+y log (1-^) 

whence, differentiating with respect to p, in order to make this quantity a maximum. 


X _ _t/_ 
p l^p 


or 


p = 


X 


n 


The question then arises as to the accuracy of this determination. This question was 
first discussed by Bayes (10), in a form which we may state thus. After observing 
this sample, when we know p, what is the probahility that p lies in any range dp ? In 
other words, what is the frequency distribution of the values of p in populations which 
are selected by the restriction that a sample of n taken from each of them yields x 
successes. Without further data, as Bayes perceived, this problem is insoluble. To 
render it capable of mathematical treatment, Bayes introduced the datum, that among 
the populations upon which the experiment was tried, those in which p lay in the range 
d/p were equally frequent for all equal ranges dp. The probability that the value of p 
lay in any range dp was therefore assumed to be simply dp, before the sample was 
taken. After the selection effected by observing the sample, the probability is clearly 
proportional to 

p*(l—py dp- 

After giving this solution, based upon the particular datum stated, Bayes adds a 
schoUum the purport of which would seem to be that in the absence of all knowledge 
save that supplied by the sample, it is^reasonaWe to. particular a prion 

distribution of p. The result, the datum, and the postulate implied by ^e sduMum, have 
all been somewhat loosely spoken of as Bayes’ Theorem. 



FOUNDATIONS OF THEORETICAL STATISTICS. 


10.325 

325 


The postulate would, if true, be of great importance in bringing an immense variety 
of questions within the domain of probability. It is, however, evidently extremely arbi¬ 
trary. Apart from evolving a vitally important piece of knowledge, t hat o f the exact 
form of the distribution of values of j), out of an assumption of complete ignorance, it is 
( even a unique solution. For we might never have happened to direct our attention 
to the particular quantity p; we might equally have measured probability upon an 
entirely difterent scale. If, for instance, 

sin 0 = 1, 

the quantity, 0, measures the degree of probability, just as well as p, and is even, for 
some purposes, the more suitable variable. The chance of obtaining a sample of x 
successes and y failures is now 


, (1-1-sin 0)^ (1 - sill ; 

applying the method of maximum likelihood, 

S (log/) = X log (l +8iii 0} -hylog (l —sin O) —?i log 2, 

and differentiating with respect to 0, 

= 1^, whence ein 0 = 

14-81110 J—sm0 n 

an exactly equivalent solution to that obtained using the variable p. But what a 'priori 
assumption are we to make as to the distribution of 0 ? Are we to assume that 0 is 
equally likely to lie in all equal ranges de ? In this case the a priori probability will 
be dejv, and that after making the observations will be proportional to 

(14-sin 0)* (1 — sin 0)^ dO. 

But if we interpret this in terms of p, we obtain 

a result inconsistent with that obtained previously. In fact, the distribution previously 
assumed for p was equivalent to assuming the special distribution for 0, 

the MbitrarmesB of which is fuUy apparent when we use any variable other than p. 

In a less obtrusive form the same species of arbitrary assumption undwliM the me(^ 




10.326 

326 MR. R. A. FISHER ON THE MATHEMATICAL 

known as that of inverse probability. Thus, if the same observed result A might be 
the consequence of one or other of two hypothetical conditions X and Y, it is assumed 
that the probabilities of X and Y are in the same ratio as the probabilities of A occurring 
on the two assumptions, X is true, Y is true. This amounts to assuming that before 
A was observed, it was known that our universe had been selected at random ^^'^n 
infinite population in which X was true in one half, and Y true in the other half. 
Clearly such an assumption is entirely arbitrary, nor has any method been put forward 
by which such assumptions can be made even with consistent uniqueness. There 
is nothing to prevent an irrelevant distinction being drawn among the hypothetical 
conditions represented by X, so that we have to consider two h 3 rpothetical possibilities 
Xj and X^„ on both of which A will occur with equal frequency. Such a distinction 
should make no difference whatever to our conclusions ; but on the principle of inverse 
probability it does so, for if previously the relative probabilities were reckoned to be 
in the ratio x to y, they must now be reckoned 2x to y. Nor has any criterion been 
suggested by which it is possible to separate such irrelevant distinctions from those 
which are relevant. 

There would be no need to emphasise the baseless character of the assumptions made 
under the titles of inverse probability and Bayes’ Theorem in view of the decisive 
criticism to which they have been exposed at the hands of Boole, Venn, and Chrystal, 
were it not for the fact that the older writers, such as Laplace and Poisson, who accepted 
these assumptions, also laid the foundations of the modern theory of statistics, and have 
introduced into their discussions of this subject ideas of a similar character. I must 
indeed plead guilty in my original statement of the Method of the Maximum Likeli¬ 
hood (9) to having based ray argument upon the principle of inverse probability ; in the 
same paper, it is true, l^^phasised the fact that such inverse pro^bilities were r elative 
only. That is to say, that while we might speak of one value of p as having an inverse 
probability three times that of another value of p, we might on no account introduce 
the differential element dp, so as to be able to say that it was three times as probable 
that p should lie in one rather than the other of two equal elements. Upon considera¬ 
tion, therefore, I perceive that the word probability is wrongly used in such a connection : 
probability is a ratio of frequencies, and about the frequencies of such values we can 
know nothing whatever. We must return to the actual fact that one value of jp, of 
the frequency of which we know nothing, would yield the observed result three times 
as frequently as would another value of p. If we need a word to characterise this 
relative property of different values of p, I suggest that we may speak without confusion 
of the likdihood of one value of p being thrice the likelihood of another, bearing always 
in mind that likelihood is not here used loosely as a synonym of probability, but simply 
to express the relative frequencies with which such values of the hypothetical quantity 
p would in fact yield the observed sample. 

The solution of the problems of calculating from a sample the parameters of the 
hypothetical population, which we have put forward in the method of likeli- 



FOUNDATIONS OF THEORETICAL STATISTICS. 


10.327 

327 


hood, consists, then, simply of choosing such values of these parameters as have the 
maximum likelihood. Formally, therefore, it resembles the calculation of the mode of 
an inverse frequency distribution. This resemblance is quite superficial: if the scale 
of measurement of the hypothetical quantity be altered, the mode must change its 
position, and can be brought to have any value, by an appropriate change of scale ; but 
the optimum, as the position of maximum likelihood may be called, is entirely unchanged 
by any such transformation. Likelihood also differs from probability* in that it is not 
a differential element, and is incapable of being integrated : it is assigned to a particular 
point of the range of variation, not t o a particular element of it. There is therefore an 
absolute measure of probability in that the unit is chosen so as to make all the elementary 
probabilities add up to unity. There is no such absolute measure of likelihood. It 
may be convenient to assign the value unity to the maximum value, and to measure 
other likelihoods by comparison, but there will then be an infinite number of values 
whose likelihood is greater than one-half. The sum of the likelihoods of admissible 
values will always be infinite. 

Our interpretation of Bayes’ problem, then, is that the likelihood of any value of p 
is proportional to 

p’(i-p)'. 

and is therefore a maximum when 



which is the best value obtainable from the sample ; we shall term this the optimum 
value of p. Other values of p for which the likelihood is not much less cannot, however, 
be deemed unlikely values for the true value of p. We do not, and cannot, know, from 
the information supplied by a sample, anything about the probability that p should lie 
between any named values. 

The reliance to be placed on such a result must depend upon the frequency distribution 
of £c, in different samples from the same population. This is a perfectly objective 
statistical problem, of the kind we have called problems of distribution ; it is, however, 
capable of an approximate solution, directly from the mathematical form of the 
likelihood. 

When for large samples the distribution of any statistic, tends to normality, we 

♦ It should be remarked that likelihood, as above defined, is not only fundamentally distinct from 
mathematical probability, but also from the logical “ probability ” by which Mr. Keynes (21) has recently 
attempted to develop a method of treatment of uncertain inference, applicable to those cases where we 
lack the statistical information necessary for the application of mathematical probability. Although, in 
an important class of cases, the likelihood may be held to measure the degree of our rational belief in a 
conclusion, in the same sense as Mr. Keynes’ “ probability,” yet since the latter quantity is constrained, 
somewhat arbitrarily, to obey the addition theorem of mathematical probability, the likelihood is a 
quantity which falls definitely outside its scojie. 




10.328 

328 


MR. R. A. FISHER ON THE MATHEMATICAL 


may write down the chance for a given value of the parameter d, that Bx should lie in 
the range dBx in the form 




(tf, 

dBx. 


The mean value of will be the true value 0 , and the standard deviation is or, the 
sample being assumed sufficiently large for us to disregard the dependence of a upon 6. 
The likelihood of any value, B, is proportional to 


this quantity having its maximum value, unity, when 


for 


0 = 0 ,; 




Differentiating now a second time 






-logd) = - 


(r** 


Now <1* stands for the total frequency of all samples for which the chosen statistic 
has the value 0 ,, consequently = the summation being taken over all such 

-Samples, where </> stands for the probability of occurrence of a certain specified sample. 
For which we know that 

log 0 = 0 +S (log/), 

the summation being taken over the individual members of the sample. 

If now we expand log/ in the form 

log/(«) = log/(«.) log/( 0 l) + log/( 0 l) + ... , 

or 

+ « 0-01+ |0-0i*+..., 

we have 

log Vi = C + 0 ^iS(o-) + j 0 ^"S( 6 ) + ... ; 
now for optimum statistics 

S (a) = 0 , 

and for sufficiently large samples S ( 6 ) differs from rib only by a quantity of order v^no-t; 
moreover, 0 — 0 , being of order n"*, the only terms in log 0 which are not reduced 
without limit, as n is increased, are 

log^ = C+inS0--0i*; 



hence 


FOUNDATIONS OF THEORETICAL STATISTICS. 


10.329 

329 


Now this factor is constant for all samples which have the same value of hence 
the variation of <l> with respect to e is represented by the same factor, and conse¬ 
quently 

log <l> = I e-Bi'; 

whence 

where 

9, being the optimum value of B. 

The formula 




supplies the most direct way known to me of finding the probable errors of statistics. 
It may be seen that the above proof applies only to statistics obtained by the method 
of maximum likelihood.* 

For example, to find the standard deviation of 



* A similar method of obtaining the standard deviations and correlations of statistics derived from 
large samples was developed by Pearson and Filon in 1898 (16). It Ls unfortunate that in this memoir 
no sufficient distinction is drawn between the pojmlalion and the sample, in consequence of which the 
formulae obtained indicate that the likelihood is always a maximum (for continuous distributions) when 
the mean of each variate in the sample is eciuated to the corresponding mean in the population (16, p. 232, 
“ A,. = 0 ”). If this were so the mean would always be a sufficient statistic for location ; but as we have 
already seen, and will sec later in more, detail, this is far from being the case. The same argument, indeed, 
is applied to all statistics, as to which nothing but their consistency can be truly affirmed. 

The probable errors obtained in this way are those appropriate to the method of maximum likelihood, 
but not in other cases to statistics obtained by the method of moments, by which method the examples 
given were fitted. In the ‘ Tables for Statisticians and Biometricians ’ (1914), the probable errors of the 
constants of the Pearsonian curves are those proper to the method of moments ; no mention is there made 
of this change of practice, nor is the publication of 1898 referred to. 

It would appear that shortly before 1898 the process which leads to the correct value, of the probable 
errors of optimum statistics, was hit upon and found to agree with the probable errors of statistics found 
by the method of moments for vortnal curves and surfaces; without further enquiry it would appear to 
have been assumed that this process was valid in all cases, its directness and simplicity being peculiarly 
attractive. The mistake was at that time, perhaps, a natural one ; but that it should have been discovered 
and corrected without revealing the inefficiency of the method of moments is a very remarkable circumstance. 

In 1903 the correct formulse for the probable errors of statistics found by the method of moments are 
given in ‘ Biometrika * (19); references are there given to Sheppard (20), whose method is employed, as 
well as to Pearson and Filon (16), although both the method and the results differ from those of the latter. 



10.330 

380 


MR. R. A. FISHER ON THE MATHEMATICAL 


in samples from an infinite population of which the true value is p, 
\ogf-x\o^p + y log(l-i9), 


cp p 1-p 


d 

rj 

Z 


log/=-S- 


p i-p 


Now the mean value of jc 4» pn, and of y is (l—j>)n, hence the mean value of 

J?log/i8 /I 1 \ 

\p 1 -p/ 

therefore 






the well-known formula for the standard error of p. 


7. Satisfaction of the Criterion of Suffioibnoy. 

That the criterion of sufficiency is generally satisfied by the solution obtained by 
the method of maximum likelihood appears from the following considerations. 

If the individual values of any sample of data are regarded as co-ordinates in 
h 3 rperspace, then any sample may be represented by a single point, and the frequency 
distribution of an infinite number of random samples is represented by a density 
distribution in hyperspace. If any set of statistics be chosen to be calculated from 
the samples, certain regions will provide identical sets of statistics ; these may be called 
isostatisticcU regions. For any particular space element, corresponding to an actual 
sample, there will be a particular set of parameters for which the frequency in that 
element is a maximum ; this will be the optimum set of parameters for that element. 
If now the set of statistics chosen are those which give the optimum values of the 
parameters, then all the elements of any part of the same isostatistical region will 
contain the greatest possible frequency for the same set of values of the parameters, 
and therefore any region which lies wholly within an isostatistical region will contain 
its maximum frequency for that set of values. 

Now let $ be the value of any parameter, S the statistic calculated by the method of 
maximum likelihood, and $i any other statistic designed to estimate the value of 6, 
then for a sample of given size, we may take 

/(a, e^didSi 

to represent the frequency with which $ and $j lie in the assigned ranges dS and d$i. 



10.331 

FOUNDATIONS OF THEORETICAL STATISTICS. 331 

The region d§de^ evidently lies wholly in the isostatistical region de. Hence the 
equation 

^ log/(«, 0i) = 0 

is satisfied, irrespective of 0,, by the value e = This condition is satisfied if 

f(6, 4, 0 .) = ^( 0 , 4).^/(6, 0 ,): 

for then 

and the equation for the optimum degenerates into 

~ log ^ (>), 0) = 0, 


which does not involve Oi. 

But the factorisation of / into factors involving {0, d) and (^, 0,) respectively is merely 
a mathematical expression of the condition of sufficiency; and it appears that any 
statistic which fulfils the condition of sufficiency must be a solution obtained by the 
method of the optimum. 

It may be expected, therefore, that we shall be led to a sufficient solution of problems 
of estimation in general by the following procedure. Write down the formula for the 
probability of an observation falling in the range dx in the form 

/(0, x)dx, 

where 0 is an unknown parameter. Then if 

L = S(log/), 

the summation being extended over the observed sample, L difiers by a constant only 
from the logarithm of the likelihood of any value of 6. The most likely value, is 
found by the equation 



and the standard deviation of by a second differentiation, from the formula 

00 * (Ti** 

thitf latter formula being applicable only where S is normally distributed, as is often 
the case with considerable accuracy in large samples. The value ai so found is in 
these cases the least possible value for the standard deviation of a statistic designed to 



10.332 

332 


MR. R. A. FISHER ON THE MATHEMATICAL 


estimate the same parameter ; it may therefore be applied to calculate the efficiency of 
any other such statistic. 

When several parameters are determined simultaneously, we must equate the second 
differentials of L, with respect to the parameters, to the coefficients of the quadratic 
terms in the index of the normal expression which represents the distribution of the 
corresponding statistics. Thus with two parameters, 

a^L ^ 1 r/L 1 J_ 

aa/ A’ 

a^L ^ l r 

or, in effect, af is found by dividing the Hessian determinant of L, with respect to the 
parameters, into the corresponding minor. 

The application of these methods to such a series of parameters as occur in the speci¬ 
fication of frequency curves may best be made clear by an example. 

8. The Efficiency of the Method of Moments in Fitting Curves of the 
Pearsonian Type HI. 

Curves of Pearson’s Type III. offer a good example for the calculation of the efficiency 
of the Method of Moments. The chance of an observation falling in the range dx is 



By the method of moments the curve is located by means of the statistic /ui, its dimen¬ 
sions are ascertained from the second moment and the remaining parameter p is 
determined from /3,. Considering first the problem of location, if a and p were known 
and we had only to determine w, we should take, according to the method of moments, 

Ml = m^+a{p+l)y 

where represents the estimate of the parameter m, obtained by using the method of 
moments. The variance of is, therefore, 

w n 

If, on the other hand, we aim at greater accuracy, and make the likelihood of the 
sample a maximum for variations of m, we have 

L = -n log a-n log(p I) +pS (log 

* The expression, x !, is used here and throughout as equivalent to the Gaussian H (x), or to F (x+1), 
whether x is an integer or not. 



FOUNDATIONS OF THEORETICAL STATISTICS. 


10.333 

333 


and the equation to determine m is 


the accuracy of the value so obtained is found from the second differential. 




of which the mean value is 


whence 


<r(p-iy 

\) 


We now sec that the efficiency of location by the method of moments is 

Pjzl = I_ L. 

/Hi />+l' 

Efficiencies of over 80 per cent, for location are therefore obtained if p exceeds 9 ; for 
p 1 the efficiency of location vanishes, as in other cases where the curve makes an 
angle with the axis at the end of its range. 

Turning now to the problem of .scaling, we have, by the method of moments, 

whence, knowing p, a is obtained. Since 


we must have 


^ (kzl = ■■ i'+ . j -. - a.’ ; 

A* 4w 877. *2 (/)+ 1) 7/ 


L the other hand, from the value of L, we find the equation 

^ = -2(j„+l) += 0, 


to be solved for m and <t as a simultaneous equation with (1) ; whence 






and 





10.334 

334 


MR. R. A. FISHER ON THE MATHEMATICAL 


of which the mean value is 

rr 


The variance of a, determined from this pair of simultaneous equations, is found by 
dividing 

r^L . // 


by the determinant 

which reduces to 

whence 






-h - 




2 

' p-l a*' 


3 ft' 
a4 — —» 

2?/ 


and the efficiency of scaling by the method of moments is 


j) + 4 jt> + 4 

Efficiency of over 80 per cent, for scaling are, therefore, obtained when p exceeds 11. 
The efficiency of'scaling does not, however, vanish for any possible value of p, though 
it tends to zero, asp approaches its limiting value, —1. 

Lastly, p is found by the method of moments by putting 


Now 


p + l 


= /«i. 


= -(4/S,-24i8,+ 36 + 9/3,/83-12/3,+35ft), 


and for curves of Type III, 


A = 2/8.A+4/3, = /8, (3ft+ 10). 

A = J(ft + :Jft) = J(3/8.’+13ft + 6). 

^,» = M.(sft + 4)(ft + 4). 


■ ■ft^ 6(p + 2)(p+6) 
n p + l 


hence 



FOUNDATIONS OF THEORETICAL STATISTICS. 


10.335 

335 


whence it follows, since n is large, that 


' = = ^(p+i)(p+2)(j9+6). 


From the value of L, 




which equation solved for m, a and^ as a simultaneous equation with (1) and (2), will 
yield the set of values for the parameters which has the maximum likelihood. I’o find 
the variance of the value of p, so obtained, observe that 


of which the mean value is 


ca dp a 
nd 

= -»;jp'Off 

The variance of p, derived from this set of simultaneous equations, is therefore found 


by dividing the minor of , namely 

vp 


by the determinant 


p—I a** 


fi* p-l p 

1 p+1 1 

i 1 


~ j 2 log (p!) - - + i I; 

a* p-l I dp^ / p 


»{2j,Ios(pI)-| + l} 

•2 log (p!) - ^ + ^^ = i ^ ■ 


When p is large. 



10.336 

336 


MK. R. A. FISHER ON THE MATHEMATICAL 


so that, approximately, 


for large values of p, the efficiency of the method of moments is, therefore, approximately 

p -v \ p+ 2 p + 6 

Efficiencies of over 80 per cent, occur when p exceeds 38-1 ~ 0-102) ; evidently 

the method of moments is effective for determining the form of the curve only when it 
is relatively close to the normal form. For small values of p, the above approximation 
for the efficiency is not adequate. The true values can easily be obtained from the 
recently published tables of the Trigamma* function (11). The following values are 
ol)taine<l for the integral values of p from 0 to 5. 


p 0 1 2 3 4 5 

Efficiency ... 0 0-0274 0-0871 0-1532 0-2150 0-2727 


An interesting point which may be resolved at this stage of the enquiry is to find 
the variance of m, when a and p are not known, derived from the above set of simul¬ 
taneous equations; that is to say, to calculate the accuracy with which the limiting 
point of the curve is determined; such determinations are often stated as the result 
of fitting curves of limited range, but their probable errors are seldom, if ever, evaluated. 
To obtain the greatest possible accuracy with which siudi a point can be determined 


we must divide the minor of 


by 


whence 




Ilk 

?:m^ 


, namely, 

-l}- 




n 


2^1og(pl)- 


2 

P 


-4- 


2 

P^ 


The position of the limiting point will, when p is at all large, evidently be determined 
with much less accuracy than is the position, as a whole, of a curve of known form and 
size. Let n' be a multiplier such that the position of the extremity of a curve calculated 

* It is sometimes convenient to write F (x) for ^ log (x!). 



FOUNDATIONS OF THEORBITICAL STATISTICS. 


10.337 

337 


from nn observations will be determined with the same accuracy as the position, as a 
whole, of a curve of known form and size, can be determined from a sample of n observa¬ 
tions when n is large. Then 


but, when p is large, 
and 

therefore 








For large values of p the probable error of the determination of the end-point may be 
found approximately by multiplying the probable error of locatioii by 


As p grows smaller, n' diminishes until it readies unity, when p 1. For values of 
p less than I it would appear that the end-point had a smaller probable error than the 
probable error of location, but, as a matter of fact, for these values location is determined 
by the end-point, and as we see from the vanishing of a-,;., whether or not p and a 
are known, whenp “ 1, the weight of the determination from this point onwards increases 
more rapidly than w, as the sample increases. (See Section 10.) 

The above method illustrates, how it is possible to calculate the variance of any 
function of the population parameters as estimated from large samples ; by comparing 
this variance with that of tlie same function estimated by the method of moments, we 
may find the efficiency of that method for any proposed function. The above examina¬ 
tion, in which the determinations of the locus, the scale, and the form of the curve are 
treated separately, will serve as a general criterion of the application of the method of 
moments to curves of Type ITI. Special combinations of the parameters will, however, 
be of interest in special cases. It may be noted here that by virtue of equation (2) the 
function of m a (p r 1) is the same, whether determined by moments or by the method 
of the optimum : 

1) = + ip t !)• 


The efficiency of the method of moments in determining this function is therefore 100 
per cent. That this function is the abscissa of the mean does not imply 100 per cent, 
efficiency of location, for the centre of location of these curves is not the mean (see p. 340). 



10.338 

388 


Mli. K A. FISHER ON THE MATHEMATICAL 


9. Location and Scalino op Fbequbnoy Curves in General. 


The general problem of the location and scaling of curves may now be treated more 
generally. This is the problem which presents itself with respect to error curves of 
assumed form, when to find the best value of the quantity measured we must locate the 
curve as accurately as possible, and to find the probable error of the result of this process 
we must, as accurately as possible, estimate its scule. 

The/orm of the curve may be specified by a function c/>, such that 


ri/ac when ^ ^ 


In this expression 0 specifies the form of the curve, which is unaltered by variations 
of a and m. 

♦ When a sample of n observations has been taken, the likelihood of any combination 
of values of a and m is 

L = C—a log« + S( 0 ), 

whence 


also 


0m a 

0a a ^ a 


Differentiating a second time, 
therefore 


0 a a 
0^L \ 


n<f> 


This expression enables us to compare the accuracy of error curves of different form, 
when the location is performed in each case by the method which yields the minimum 
error. 

Example :—The curve 




IJI., 

IT I 


referred to in Section 5 has an infinite standard deviation, but it is not on that account 
an error curve of zero accuracy, for 


= -log{i+f’), 



w/' = _2iLiO 



FOUNDATIONS OF THEORETICAL STATISTICS 


10.339 

339 


Now 


hence 

0 " = — i and <T,' = — 

The quantity, 





a~ 2a" 


whicli is tlie factor by which n is multiplied in calculating the weight of the estimate 
made from n measurements, may be called the intrinsic accuracy of an error curve. In 
the above example we see that errors distributed so that 




a dx 


have the same intrinsic accuracy as errors distributed according to the normal curve 


provided 


d/^ 


I 


<T \ 2t 


dx. 


Fig. 1 illustrates two sucli curves of equal intrinsic accuracy. 
Returning now to the general problem in which 


we have 

and 


L = 0 —a log<t + S ( 0 ), 


f m ra (C 

(;“Jj 1 o ./n . a. 1 


The latter expression will directly give the accuracy with which a is determined only if 


d m hjb 


= 0 , 


and we can always arrange that this shall be so by subtracting from ^ the quantity 



Thus in a Type III. curve where, referred to the end of the range, 



10.340 

340 


MR. R. A. FISHER ON THE MATHEMATICAL 


instead of 
we inii.st write 


i>=P log 

0 = p log -f+^j-1; 

= -- 

f+/>—1 


V i+p-^ ' 


of which the mean value is 




For one particular point of origin, therefore, the variations of the abscissa are 
uiicorrelated with those of a ; this point may be termed the centre of location. 

Example :—To determine the centre of location of the curve of Type IV., 


from these we find 




0 = — r tail"* log I 


,{/' = 7+2 iT?" ' +2 (.f-mF) rr? ' 


/•+1 r +2 7’+4 


r-fl r+2 V 




The centre of location, therefore, at the distance from the mode, 



FOUNDATIONS OF THEORETICAL STATISTICS. 


10.341 

341 


Emmple .-—Determine the intrinsic accuracy of an error curve of Type IV. and the 
efliciency of the method of moments in location and scaling. 

Since 


r-fl r-f 2 r + 4 
r + 4 + 


a _ ^4- 4 + 

n 7*-|-l r4-2r+4 ’ 


and the intrinsic accuracy of tlie curve is 


1 /•+1 r + 2 ?- + 4 


therefore tlie cfllciency of the method of moments in location is 


1 (r + 4^4-y^) 

/•+! r + 2r + 4(»-*+./'*). 

When 1 / = 0, we have for curves of Type VIT. an efficiency of location 


The efficiency of location of these curves vanishes at r = 1, at which value the standard 
deviation becomes infinite. Although values down to —1 give admissible frequency 
curves, the conventional limit at which curves are reckoned as heterotypic is at r = 7. 
For this value the efficiency is 

49 121+ 

132 ■ 49 + ^=" ’ 

which varies from 91*67 per cent, for the symmetrical Type VII. curve, to 37*12 per 
cent, when i/ -> oo and the curve to Type V. 

Turning to the question of scaling, we find 




'r+ I (2?*-4-4-f 


^ ^ 0" r + 4 




10.342 

342 


MR. R. A. FISHER ON THE MATHEMATICAL 


the intrinsic accuracy of scaling is therefore independent of v. Now for these curves 


so that 


and 




Hr-l 


r-2r~3\ 


Yr+6- *'■ 




y- 


i9,-l _ +i."(r"+10i--12) 

4 2r—2 r—3 + 

a _ cr' 2 +y^(^*^+10r—12) 

71 2 r —2 7’—3 + 


The efficiency of the method of moments for scaling is thus 


7‘—2 ?•—3 r + 4 +}/‘) 

r+l (7*®7’—2 + 1 /^( 7 *^+10?’—12)} ’ 

when V = 0, we have for curves of Type VII. an efficiency of scaling 


(4) 


r 7’+ 1 

The efficiency of the method of moments in scaling these curves vanishes at r = 3, 
where becomes infinite ; for r = 7, the efficiency of scaling is 

55 49+ 

2 ’ 1715 + 107v"’ 

varying in value from 78*57 per cent, for the symmetrical T}^e VII. curve, to 25*70 
per cent, when v 00 and the curve to Type V. 


10. The Efficiency of the Method of Moments in fitting the Peabsonian 

Curves. 

The Pearsonian group of skew curves are obtained as solutions of the equation 

I dy _ — (ac—m) 

y dx a + bx+cx*’. 

algebraically these fall into two main classes, 

and 

according as the roots of the quadratic expression in (6) are real or imaginary. 




10.343 

FOUNDATIONS OF THEORETICAL STATISTICS. 343 

The first of these forms may be rewritten 

.Jdx, 

r being negative, showing its affinity with the second class. 

In order that these expressions may represent frequency curves, it is necessary that 
the integral over the whole range of the curve should be finite ; this restriction acts in 
two ways :— 

(1) When the curve terminates at a finite value of x, say x = the power to which 

a.j~x is raised must be greater than — I. 

(2) When the curve extends to infinity, the ordinate, when x is large, must diminish 

more rapidly than ^; 

In Fig. 2 is shown a conspectus of all possible frequency curves of the Pearsonian type ; 


A 









10.344 

344 


MR. K, A. FISHER ON THE MATHEMATICAL 


the lines AC and AC^ represent the limits along which the area between the curve and 
a vertical ordinate tends to infinity, and on which Wj, or takes the value — 1 ; the 
line CC' represents the limit at which unbounded curves enclose an infinite area with 
the horizontal axis ; at this limit t —1. 

The symmetrical curves of Type II. 



extend from the point N, representing the normal curve, at which r is infinite, through 
the point P at which r = —4, and the curve is a parabola, to the point B (r == —2), 
where the curve takes the form of a rectangle ; from this point the curves are U-shaped, 
and at A, when the arms of U are hyperbolic, we have the limiting curve of this type, 
which is the discontinuous distribution of equal or unequal dichotomy (r = 0). 

The unsymmetrical curves of Type I. are divided by Pearson into three classes 
according as the terminal ordinate is infinite at neither end, at one end (J curves), or 
at both ends (IT curves); the dividing lines are G'BD and CBD', along which one of 
the terminal ordinates are finite (m„ or m,, == 0); at the point B, as we have seen, both 
terminal ordinates are finite. 

The same line of division divides the curves of Type III., 

df oc oifer* dx, 

at the point E (p = 0), representing a simple exponential curve ; the J curves of Type III. 
extend to F (p == — 1), at which point the integral ceases to converge. In curves of 
Type III., r is infinite ; v is also infinite, but one of the quantities m, and is finite, 
or zero (= p); as p tends to infinity we approach the normal curve 

dfcc dx. 

T 3 rpe VI., like Type III., consists of curves bounded only at one end; here r is 
positive, and both and m, are finite or zero. For the J curves of Type VI. both 
mj and nij are negative, but for the remainder of these curves they are of opposite sign, 
the negative index being the greater by at least unity in order that the representative 
point may fall above CC' (r = —1). 

Type V. is here represented by a parabola separating the regions of Types IV. and VI.; 
the typical equation of this type of curve is 

-.r±i* -1 

df <xx » e *dx. 

As r tends to infinity the curve tends to the normal form; the integral does not 
become divergent until = 1, or r = —1. On curves of Type V., then, r is finite 
or zero, but v is infinite. 



In Type IV. 


FOUNDATIONS OF THEORETICAL STATISTICS. 


10.345 

345 


df «X ^ a ; 

we have written not as previously for the difference between and nij, for these 
(Quantities are now complex, and their difference is a pure imaginary, but for the differ¬ 
ence divided by \/—1 ; p is then real and finite throughout Type IV., and it vanishes 
along the line NS, representing the symmetrical curves of Type VII. 



from r — oo to r = — 1. 

The Pearaonian system of frequency curves has hitherto been represented by the 
diagram (13, p. 66), in which the co-ordinates are and This is an unsymmetrical 
diagram which, since fSi is necessarily positive, places the symmetrical curves on a 
boundary, whereas they are the central types from which the unsymmetrical curves 
diverge on either hand; further, neither of the limiting conditions of these curves can 
be shown on the diagram ; the limit of the U curves is left obscure,* and the other 
limits are either projected to infinity, or, what is still more troublesome, the line at 
infinity cuts across the diagram, as occurs along the line r — 3, for there f-i.> becomes 
infinite. 'Phis diagram thus excludes all curves of Types VIL, IV., V., and VI., for which 
>• < 3. 

In the fi diagram the condition r = constant yields a system of concurrent straight 
lines. The basis of the representation in fig. 2 lies in making these lines parallel and 

horizontal, so that the ordinate is a function of r only. We have chosen r ~ y — -, 

and have represented the limiting types by the simplest geometrical forms, straight lines 
and parabolas, by taking 

4e = = 

It might have been thought that use could have been made of the criterion, 
4(4)8,-3/8,)(2/S>-3/8i~ 6) 4e. 

by which Pearson distinguishes these curves; but this criterion is only valid in the 
region treated by Pearson. For when r = 0, k.^ ~ 1, and we should have to place 
a variety of curves of Types VII., IV., V., and VI., all in Type V. in order to Adhere to 
the criterion. 

This diagram gives, I believe, the simplest possible conspectus of the whole of the 
Pearaonian system of curves ; the inclusion of the curves beyond f = 3 becomes neccs- 

* The true limit is the line /So « /8i +1, along which the curves degenerate into simple dichotomies. 


?/(a^ + 2/) 



10.346 

346 


MR. R. A. riSHER ON THE MATHEMATICAT. 


sary as soon as we take a view unrestricted by the method of moments ; of the so-called 
heterotypic curves between ?• = 3 and r = 7 it should be noticed that they not 
only fall into the ordinary Pearsonian types, but have finite values for the moment 
coefficients and fi., ; they differ from those in which r exceeds 7, merely in the fact 
that the value of /92, calculated from the fourth moment of a samplCy has an infinite probable 
error. It is therefore evident that this is not the right method to treat the sample, but 
this does not constitute, as it has been called, “ the failure of Type IV.,” but merely 
the failure of the method of moments to make a valid estimate of the form of these 
curves. As we shall see in more detail, the method of moments, when its efficiency is 
tested, fails equally in other parts of the diagram. 

In expression (3) we have found that the efficiency of the method of moments for 
location of a curve of Type IV. is 

E = r’r-i{r-^4 

r+ I c + 2^*-t4 

whence if we sub.stitute for r and p in terms of the co-ordinates of our diagram, we obtain 
a general forniuhi for the efficiency of the method of moments in locating Pearsonian 
curves, which is applicable within the boundary of the zero contour (fig. 3). This may 



Fig. 3. Region of validity of the first moment (the mean) applied in the location of 
Pearsonian curves showing contours of efficiency. 


be called the region of validity of the first moment; it is bounded at the base by the 
line y = 1, so that the first moment is valid far beyond the heterotypic limit; its other 
boundary, however, represents those curves which make a finite angle with the axis at 
the end of their range (m,, or = 1); all J curves (mj, or < 0) are thus excluded. 
This boundary has a double point at P, which thus forms the apex of the region of validity. 



FOUNDATIONS OF THEORETICAL STATISTICS. 


10.347 

347 


In fig. 3 are shown the contours along which the efficiency is 20, 40, 60, and 80 per cent. 
For high efficiencies these contours tend to the system of ellipses, 

+ = 1-E. 

In a similar manner, we liave obtained in expression (4) the efficiency of the 
second moment in fitting Vearsonian curves. The region of validity in this case is 
shown in fig. 4 ; this region is bounded by the lines r --- 3, r ~~ —4, and by the limits 



(mj, or m.j, --- -1) on which vanishes. Tliis statistic is therefore valid for certain 
J curves, though the maximum efficiency among the J curves is about 30 per cent. 
As before, the contours are centred about the normal curve (N) and for high efficiencies 
tend to the system of concentric circles, 

1lV+] 2//’= l-E, 

showing that the region of high efficiency is somewhat more restricted for the second 
moment, as compared to the first. 

The lower boundary to the efficiencies of these statistics is due merely to their probable 
errors becoming infinite, a weakness of the method of moments which has been partially 
recognised by the exclusion of the so-called heterotypic curves (/ < 7). The stringency 
of the upper boundary is much more unexpected; the probable errors of the moments do 
not here become infinite ; only the ratio of the probable errors of the moments to the 
probable error of the corresponding optimum statistics is great and tends to infinity as 
the size of the sample is increased. 

That this failure as regards location occurs when the curve makes a finite angle with 
the axis may be seen by considering the occurrence of observations near the terminus 
of the curve. 

Let 


d/= 



10.348 

348 


MB. R. A. FISHER ON THE MATHEMATICAL 


in the neighbourhood of the terming, then the chance of an observation falling within 
a distance x of the terminus is 

/= = 

and the chance of n observations all failing to fall in this region is 

( 1 -/)* 

or, when n is great, and / correspondingly small, 

Equating this to any finite probability, we have 

n 

or, in other words, if we use the extreme observation as a moans of locating the terminus, 
the error, x, is proportional to 

_i__ 

; 

when a < 1, this quantity diminishes more rapidly than and consequently for large 
samples it is much more accurate to locate the curve by the extreme observation than 
by the mean. 

Since it might be doubted whether such a simple method could really be more accurate 
than the process of finding the actual mean, we will take as example the location of 
the curve (B) in the form of a rectangle, 

(if =—, < a; < m+-» 

a 2 2 

and 

c{/-=0, 

outside these limits. 

This is one of the simplest types of distribution, and we may readily obtain examples 
of it from mathematical tables. The mean of the distribution is m, and the standard 

deviation —^, the error m.—w, of the mean obtained from n observations, when n is 
x/l2 

reasonably large, is therefore distributed according to the formula 



The difference of the extreme observation from the end of the range is distributed 
according to the formula 

- e 



FOUNDATIONS OF THEORETICAL STATISTICS. 


10.349 

349 


if ^ is the difference at one end of the range and t) the difference at the other end, the 
joint distribution (since, when n is considerable, these two quantities may be regarded 
as independent) is 


//. 

(I 


2 

2 




dij. 


Now if we take the mean of the extreme observations of the sample, our error is 

for which we write x ; writing also y for f i- »/, we have the joint distribution of x and y, 

—2 e “ ’ dic dy. 
a 

For a given value of x the values of y range from 21 x | to qo , whence, integrating with 
respect to y, we find the distribution of x to be 

df — - e « ax, 
a 



The two error curves are thus of a radicaUy different form, and strictly no value for 
the rfhciency can be calculated ; if, however, we consider the ratio of the two standard 

deviations, then > a* a* 6 

~ S? ’ 12n n 

when n is large, a quantity which dhninishe. indefinitely as the sample is increased. 




10.350 

350 


MR. R. A. FISHER ON THE MATHEMATICAL 


For example, we have taken from Veoa (14) sets of digits from the table of Natural 
Logarithms to 48 places of decimals. The last block of four digits was taken from the 
logarithms of 100 consecutive numbers from 101 to 200, giving a sample of 100 numbers 
distributed evenly over a limited range. It is sufficient to take the three first digits 
to the nearest integer ; then each number has an equal chance of all values between 0 
and 1000. The true mean of the population is 600, and the standard deviation 289. 
The standard error of the mean of a sample of 100 is therefore 28*9. 

Twenty-five such samples were taken, using the last five blocks of digits, for the 
logarithms of numbers from 101 to 600, and the mean determined merely from the highest 
and lowest number occurring, the following values were obtained :— 


Digits. 

1st hundred. 

2nd hundred. 

3rd hundred. 

4th hundred. 

5th hundred. 

Lowest. 

Highest. 

m—m. 

•i 1 s 

3 'S <s 

Lowest. 

Highest. 

rh—m. 

Lowest. 

Highest. 

m. 

Lowest. 

Highest. 

7ft—m. 

46-48 

24 978 + 1-0 

39 980 + 9-6 

1 999 0 

16 983 - 0-6 

18 994 4-6-0 

41-44 

35-6 993 -fl4 0 

3 960 -18 5 

6 997 4-1-6 

1 978 -10-6 

4 979 --8-6 

37-40 

9 988 ~ 1*6 

11 999 -h 60 

31 984 4*7-6 

4 978 - 9-0 

2 986 -6.0 

33-36 

7 995 + 1-0 

13 997 4- 5-0 

4 998 4-1-0 

0 994 - 30 

3 981 -8-0 

29-32 

1 988 ~ 5-6 

3 988 — 4-6 

4 992 -2-0 

1 996 - 1-6 

21 977 -1 0 


It will be seen that these errors rarely exceed one-half of the standard error of the 
mean of the sample. The actual mean square error of these 25 values is 6* 86, while the 
calculated value, -s/fiO, is 7 • 07. It will therefore be seen that, with samples of only 100, 
there is no exaggeration in placing the efficiency of the method of moments as low as 
6 per cent, in comparison with the more accurate method, which in this case happens 
to be far less laborious. 

Such a value for the efficiency of the mean in this case is, however, purely conven¬ 
tional, since the curve of distribution is outside the region of its valid application, and 
the two curves of sampling do not tend to assume the same form. It is, however, 
convenient to have an estimate of the effectiveness of statistics for small samples, and 
in such cases we should prefer to treat the curve of distribution of the statistic as an 
error curve, and to judge the effectiveness of the statistic by the intrinsic accura4>y of 
the curve as defined in Section 9. Thus the intrinsic accuracy of the curve of distri-* 
bution of the mean of all the observations is 

12w 



POTTNDATIONS OF THEORETICAL STATISTICS. 


10.351 

351 


while that of the mean of the extreme values is 



so yielding a ratio 3/w. It is probable that this quantity may prove a suitable substitute 
for the efficiency of a statistic for curves beyond its region of validity. 

To determine the efficiency of the moment coefficients and in determining the 
form of a Pearsonian curve, we must in general apply the method of Section 8 to the 
calculation of the simultaneous distribution of the four parameters of those curves when 
estimated by the method of maximum likelihood. Expressing the curve by the formula 
appropriate to Type IV., we are led to the determinant 


r+lr4-2r + 4 

r+\ r + 2 V 

r-f-1 r+2 

r+l V 

a* (r+A +v*) 

a? (r+4“ + v*) 

a{r-^2^ + v^) 

a(r + 2\v^) 

r+\r-\-''2iv 

r-l-1 (2r+4 + v’) 

r+l V 

r+2 + v^ 


a* (?’4-4^ + 1/*) 

a (r + 2^-^v^) 

a (r + 2‘^+v^) 

r+1r+2 
a (t’ + 2 + 

r+1 1 / 

a (r + 2‘‘ + p^) 



1 V 




a(r + 2*+»/*) 

a(r + 2%i/^) 



as the Hessian of —L, when 

F = r e'^mvede. 

Jo 

The ratios of the minors of this determinant to the value of the determinant give 
the standard deviations and correlations of the optimum values of the four parameters 
obtained from a number of large samples. 

In discussing the efficiency of the method of moments in respect of the form of the 
curve, it is doubtful if it be possible to isolate in a unique and natural manner, as we 
have done in respect of location and scaling, a series of parameters which shall successively 
represent different aspects of the process of curve fitting. Thus we might find the 
efficiencies with which r and v are determined by the method of moments, or those of 
the parametric functions corresponding to /8i and or we might use and as 
independent parameters of form; but in all these cases we should be employing an 
arbitrary pair of measures to indicate the relative magnitude of corresponding contour 
ellipses of the two frequency surfaces. 

For the symmetrical series of curves, the Types II. and VII., the two systems of 



352 MB. B. A. FISHEB ON THE MATHEMATICAL 

ellipses are coaxial, the deviations of r and v being uncorrelated; in the case of Type Vli. 
we put i; =a 0, in the determinant given above, which then becomes 

I ^'+1^+2 ^ r+l ^ I 

r+4 r-f2 


« 4.(1) 0 


and falls in the two factors 


r r+l fe A*-l \ c f'f'W 1 'ir r+lr +2 e /r\ £+7] 

L2r+4l ' 2 / \2/J r-2*JL 2r + 4 \2/ 

f (gj ~2r+l r + 4 


4 r+1 r+2 

2’|Fr-^)-FfS)k2— 4 ’ 


The corresponding expressions for the method of moments are 

no- ^ ? 7 *^?*~2^(r^+r-f 10) 


JUT ^ = ? r?-—1 r—3 

3' r-5r^T 

Since for moderately large values of r, we have, approximately, 

’•+2’ f (1) -2 ~i m=f (1- ^=.), 
~2’ {f (2^) - f (I)}-2 TTiTTi = 6- 


and 




FOUNDATIONS OF THEORETICAL STATISTICS. 


10.353 

353 


we have, approximately, for the efficiency of v^, 


r+2 ...) r—1 r—3 r—5 
(r*+r + 10) 2* 


or, when r is great, 
and for the efficiency of 

or, when r is great. 




28-8 


(r + 2^ +i ... )r+l^r~5 r—7 . 
(r^—r+18)rr—!%•—3 


1 - 


53*3 


The following table gives the values of the transcendental quantities required, and 
the efficiency of the method of moments in estimating the value of p and r from samples 
drawn from Type VII. distribution. 


r. 

- + 1 r + 4. 

Efficiency 
of v^. 

»•+1* r + 2^ 

-{■w-'ffli 

-2r+l r + 4. 

Efficiency 
of r^. 

6 

5-31271 

0 



6 

6-31736 

0-2672 



7 

6-32060 

0-4338 

6-9473 

0 

8 

5-32296 

0-6669 

6-9674 

0-1687 

9 

6-32472 

0-6449 

6-9649 

0-3130 

10 

5-32607 

0-7097 

6-'9706 

0-4403 

11 

6-32713 

0-7686 

6-9760 

0-6207 

12 

6-32797 

0-7963 

6-9787 

0-6936 

13 

6-32866 

0-8269 

6-9810 

0-6619 

14 

6-32919 

0-8497 

6-9839 

0-6990 

15 



6-9853 

0-7376 

16 



6-9870 

0-7694 

17 



6-9883 

0-7969 

18 



6-9896 

0-8182 


It will be seen that we do not attain to 80 per cent, efficiency in estimating the form 
of the curve until r is about 17*2, which corresponds to 3-42. Even for sym¬ 
metrical curves higher values of IS.j imply that the method of moments makes use of 
less than four-fifths of the information supplied by the sample. 




10.354 

354 


MR. R. A. FISHER ON THE MATHEMATICAL 


On the other side of the normal point, among the Type II. curves, very similar formulae 
apply. The fundamental Hessian is 




where r is written for the positive quantity, — r, whence 


'tL(r/ = ' 


2 


and 

Now since 

it follows that 




4 r—1 r—2 



r" 





r-lr-2^\ 


^ 2 ) 

|-F| 

fV— 1 \ 
\ 2 ), 

II 

1 




r—2* F —2r—Ir—4 = 7’—2* F —2rr—3. 

which is the same function of r—4 as 


is of r. 

In a similar manner 


7-+2’’ F (^) -2?*+l ?- + 4 


which is the same function of r—3 as 

r+l*r+2*|F (|)| -2 7+17+4 

is of r. 



10.355 

355 


FOUNDATIONS OF THEORETICAL STATISTICS. 

In all these functions and those of the following table, r must be substituted as a 
positive quantity, although it must not be forgotten that r changes sign as we pass from 
T 3 rpe VII. to Type II., and we have hitherto adhered to the convention that r is to 
be taken positive for Type VII. and negative for Type II. 


r. 

-2; -l ) -4. 

Efficiency 
of .V. 

~ 2r - 1 r - 4. 

Efficiency 
of r^. 

2 

4 

0 

4 

0 

3 

4-93480 

0-0676 

5-1595 

0-0431 

4 

6-16947 

0-2066 

6-6648 

0-1446 

5 

6-23966 

0-3590 

5-7410 

0-2613 

6 

6-27678 

0-4865 

5-8305 

0-3708 

7 

6-29472 

0-6857 

5-8813 

0-4653 

8 

5-30576 

0-6615 

5-9120 

0-5441 

9 

5-31271 

0-7198 

5-9331 

0-6090 

! 10 

5-31730 

0-7650 

5-9473 

0-6624 1 

‘ 11 

6-32060 

0-8005 

5-9574 

0-7063 

12 

5-32296 

0-8287 

5-9649 

0-7427 

13 

5-32472 

1 0-8516 

5-9700 

0-7731 

14 

5-32607 

! 0-8702 

5-9750 

0-7986 

16 


i 

5-9787 

0-8202 


In both cases the region of validity is bounded by the rectangle, at the point B 
(fig. 2, p. 343). Efficiency of 80 per cent, is reached when r is about 14*1 2’65). 

Thus for symmetrical curves of the Pearsonian type we may say that the method of 
moments has an efficiency of 80 per cent, or more, when f-i., lies between 2*66 and 3*42. 
The limits within which the values of the parameters obtained by moments cannot be 
greatly improved are thus much narrower than has been imagined. 

11. The Reason for the Efficiency of the Method of Moments in a Small 
Region surrounding the Normal Curve. 

We have seen that the method of moments applied in fitting Pearsonian curves has 
an efficiency exceeding 80 per cent, only in the restricted region for which hes between 
the limits 2*65 and 3*42, and as we have seen in Section 8, for which f-i^ does not exceed 
O’l. The contours of equal efficiency are nearly circular or elliptical within these 
limits, if the curves are represented as in fig. 2, p. 343, and are ultimately centred round 
the normal point, at which point the efficiencies of all parameters tend to 100 per cent. 
It was, of course, to be expected that the first two moments would have 100 per cent, 
efficiencies at this point, for they happen to be the optimum statistics for fitting 
the normal curve. That the moment coefficients and also tend to 100 per cent, 
efficiency in tbin region suggests that in the immediate neighbourhood of the normal 




10.356 

356 


MR. R. A. FISHER ON THE MATHEMATICAL 


curve the departures from normality specified by the Pearsonian formula agree with 
those of that system of curves for which the method of moments gives the solution of 
maximum likelihood. 

The system of curves for which the method of moments is the best method of fitting 
may easily be deduced, for if the frequency in the range dx be 

/y (oj, 6i, 6*, 63, 64) rfa;, 

then 

must involve x only as polynomials up to the fourth degree ; consequently 

y __ g-a*(j!‘+Pl*»+l>»**+PiX+J)*)^ 

the convergence of the probability integral requiring that the coefficient of x* should be 
negative, and the five quantities a, y)a, pa, pt being connected by a single relation, 
representing the fact that the total probability is unity. 

Typically these curves are bimodal, and except in the neighbourhood of the normal 
point are of a very different character from the Pearsonian curves. Near this point, 
however, they may be shown to agree with the Pearsonian type ; for let 

y = 

represent a curve of the quartic exponent, sufficiently near to the normal curve for the 
squares of ki and k.j to be neglected, then 

__ X _ ^ 

neglecting powers of k^ and k^. Since the only terms in the denominator constitute a 
quadratic in x, the curve satisfies the fundamental equation of the Pearsonian type of 
curves. In the neighbourhood of the normal point, therefore, the Pearsonian curves 
are equivalent to curves of the quartic exponent; it is to this that the efficiency of 
and yu„ in the neighbourhood of the normal curve, is to be ascribed. 

12. Discontinuous Distributions. 

The applications hitherto made of the optimum statistios have been problems in 
which the data are ungrouped, or at least in which the grouping intervals are so small 
as not to disturb the values of the derived statistics. By grouping, these continuous 



10.357 

FOUNDATIONS OF THEORETICAI. STATISTICS. 357 

distributions are reduced to discontinuous distributions, and in an exact discussion must 
be treated as such. 

If p, be the probability of an observation falling in the cell («), p, being a function of 
the required parameters 6^ Sj, and in a sample of N, if n, are found to fall into 
that cell, then 

S(log/) = S(n.logp.). 

If now we write n, = p,N, we may conveniently put 

L = s(n.log|), 

where L differs by a constant only from the logarithm of the likelihood, with sign 
reversed, and therefore the method of the optimum will consist in finding the minimum 
value of L. The equations so found are of the form 


aL 



0 . 


( 6 ) 


It is of interest to compare these formulae with those obtained by making the Pearsonian 
X* a minimum. 

For 


and therefore 






i+x^ 


- = s('^ 

\n,, 


so that on differentiating by dO^ the condition that x* should be a minimum for variations 
of 0 is 



Equation (7) has actually been used (12) to improve ” the values obtained by the 
method of moments, even in cases of normal distribution, and the Poisson series, where 
the method of moments gives a strictly sulficieiit solution. The discrepancy between 
these two methods arises from the fact that x'^* is itself an approximation, applicable 
only when 77, and n, are large, and the difference between them of a lower order of 
magnitude. In such cases 


L = s(n.iog|) = s(;;rr.iog”-i^) = 


6w^ 


•I’ 


and since 


S (x) = 0, 





10.358 

358 


MR. R. A. FISHER ON THE MATHEMATICAL 


we have, when x is in all cases small compared to m, 

as a first approximation. Tn those cases, therefore, when x* is a valid measure of the 
departure of the sample from expectation, it is equal to 2L; in other cases the approxi¬ 
mation fails and L itself must be used. 

The failure of equation (7) in the general problem of finding the best values for the 
parameters may also be seen by considering cases of fine grouping, in which the majority 
of observations are separated into units. For the formula in equ.ation (6) is equivalent to 



where the summation is taken over all the observations, while the formula of 
equation (7), since it involves changes its value discontinuously, when one 
observation is gradually increased, at the point where it happens to coincide with a 
second observation. 

Logically it would seem to be a necessity that that population which is chosen in 
fitting a hypothetical population to data should also appear the best when tested for 
its goodness of fit. The method of the optimum secures this agreement, and at the 
same time provides an extension of the process of testing goodness of fit, to those cases 
for which the x* test is invalid. 

The practical value of x* lies in the fact that when the conditions are satisfied in 
order that it shall closely approximate to 2L, it is possible to give a general formula 
for its distribution, so that it is possible to calculate the probability, P, that in a random 
sample from the population considered, a worse fit should be obtained; in such cases 
X* is distributed in a curve of the Pearsonian Type III., 



df oc L * e~^ dlt, 


where n' is one more than the number of degrees of freedom in which the sample may 
differ from expectation (17). 

In other cases we are at present faced with the difficulty that the distribution L 
requires a special investigation. This distribution will in general be discontinuous (as 
is that of x*)» but it is not impossible that mathematical research will reveal the existence 
of effective graduations for the most important groups of cases to which x® cannot 
be applied. 



FOUNDATIONS OF THEOBETIOAL STATISTICS. 


10.359 

359 


We shall conclude with a few illu.strations of important types of discontinuous 
distribution. 


1. The Poisson Sei-ies. 



1 , ni, 


nr 
2\ ’ 



involves only the single parameter, and is of great importance in modern statistics. 
For the optimum value of m, 


whence 


or 



m — X. 


The most likely value of rw is therefore found by taking the first moment of the series. 
Differentiating a second time, 


so that 


as is well known. 



n 

m 


m 

n 


2. Grouped Normal Data. 


In the case of the normal curve of distribution it is evident that tli(3 second moment 
is a sufficient statistic for estimating the standard deviation ; in investigating a sufficient 
solution for grouped normal data, we are therefore in reality finding the optimum 
correction for grouping ; the Sheppard correction having been proved only to satisfy 
the criterion of consistency. 

For grouped normal data we have 


P> = 


1 

<r 2'7r 


r 


e 2 .^* dx. 


and the optimum values of m and a- are obtained from the equations, 



cU^g/«,^Uo: 

Po- \p, 



10.360 

360 


MR. R. A. FISHER ON THE MATHEMATICAL 


or, if we write, 

we have the two conditions, 

and 


z = —e 




As a simple example we shall take the case chosen by K. Smith in her investigation of 
the variation of x‘ in the neighbourhood of the moment solution (12). 

Three hundred errors in right ascension are grouped in nine classes, positive and 
negative errors being thrown together as shown in the following table :— 

0"*1 arc 0-1 1-2 2-3 3-4 4-6 6-6 6-7 7-8 8-9 

Frequency . . 114 84 63 24 14 6 3 I 1 


The second moment, without correction, yields the value 


= 2*282642. 


Using Sheppard’s correction, we have 


= 2*264214, 

while the value obtained by making a minimum is 


= 2*366860. 

If the latter value were accepted we should have to conclude that Sheppard’s correc¬ 
tion, even when it is small, and applied to normal data, might be altogether of the 
wrong magnitude, and even in the wrong direction. In order to obtain the optimum 

value of <r, we tabulate the values of in the region under consideration; this may 

va- 

be done without great labour if values of a- be chosen suitable for the direct application 
of the table of the probability integral (13, Table 11.). We then have the following 
values:— 



043 

0*44 

0*45 

0*46 

<r 





So* 

-I-15135 

+ 2149 

-11*098 

-24*605 

3t 


-0*261 

- 0*260 





By interpolation, 


FOUNDATIONS OF THEORETICAL STATISTICS. 


10.361 

361 


4 =0*441624 

«r 

= 2*264.37. 


We may therefore summarise these results as follows :— 


Uncorrected estimate o( <r . 2 • 28254 

Sheppard’s correction. —0-01833 

Correction for maxi mum 1 ikel ihood. —0*01817 

Correction ” for minimum x*. 1-0*07332 


Far from shaking our faith, therefore, in the adequacy of Sheppard’s correction, 
when small, for normal data, this example provides a striking instance of its effective¬ 
ness, while the approximate nature of the x® test renders it unsuitable for improving a 
method which is already very accurate. 

It will be useful before leaving the subject of grouped normal data to calculate the 
actual loss of efficiency caused by grouping, and the additional loss due to the small 
discrepancy between moments with Sheppard’s correction and the optimum solution. 

To calculate the loss of efficiency involved in the process of grouping normal data, let 




when a<r is the group interval, then 


«=/(f) +£.ru) + T^/"(f)+ - 


1920 


322,560*' 


=/(f) {i++ tIo ^ "I 


whence 


log t, = + ..., 


2880 


181,440' 


and 


lo, . . -i + MS 


ii. _ ^ ^ -jLLflt!_ 

12 144 1778 25 . 12 ^ 


of which the mean value is 







10.362 

362 


MR. R. A. FISHER ON I’HE MATHEMATICAL 


neglecting the periodic terms; and consequently 


n\ 12 flMoo J 


Now for the mean of iiiigroiipcd data 


80 that the loss of efficiency due to grouping is nearly . 

12 


The further loss caused by using the mean of the grouped data is very small, for 


n n V ^ 12/ 


neglecting the periodic terms ; the loss of efficiency by using vi therefore is only 


Similarly for the efficiency for scaling, 




of which the mean value is 


(tU 6 40 270 ■ 


neglecting the periodic terms ; and consequently 


a ■ g-M 1 I a’’ , a* tt* 1 
' 2«l 6 360 10,800 ■■■] 


For ungrouped data 


-Tn’ 


so that the loss of efficiency in scaling due to grouping is nearly ^. I’his may be made 

as low as 1 per cent, by keeping a less than J. 

The further loss of efficiency produced by using the grouped second moment with 
Sheppard’s correction is again very small, for 


n »). r 6 360' 


neglecting the periodic terms. 



FOUNDATIONS OF THEORETICAL STATISTICS. 


368 


Whence it appears that the further loss of efficiency is only 


10,800 ■ 

We may conclude, therefore, that the high agreement between the optimum value of 
<r and that obtained by Sheppard’s correction in the above example is characteristic 
of grouped normal data. The method of moments with Sheppard’s correction is highly 
efficient in treating such material, the gain in efficiency obtainable by increasing the 
likelihood to its maximum value is trifling, and far less than can usually be gained by 
using finer groups. The loss of efficiency involved in grouping may be kept below 
I per cent, by making the group interval less than one-quarter of the standard deviation. 

Although for the normal curve the loss of efficiency due to moderate grouping is very 
small, such is not the case with curves making a finite angle with the axis, or having at 
an extreme a finite or infinitely great ordinate. In such cases even moderate grouping 
may result in throwing away the greater part of the information which tlie sample 
provides. 


3. Distribution of Observations in a Dilution Series. 

An important type of discontinuous distribution occurs in the application of the 
dilution method to the estimation of the number of micro-organisms in a sample of 
water or of soil. The method here presented was originally developed in connection 
with Mr. Cutler’s extensive counts of soil protozoa carried out in the protozoological 
laboratory at Rothamsted, and although the raetliod is of very wide application, this 
particular investigation affords an admirable example of the statistical principles 
involved. 

In principle the method consists in making a .series of dilutions of the soil sample, 
and determining the presence or absence of each type of protozoa in a cubic centimetre 
of the dilution, after incubation in a nutrient medium. 

The series in use proceeds by powers of 2, so that the frequency of protozoa in each 
dilution is one-half that in the last. 

The frequency at any stage of tlie process may then be represented by 



when X indicates the number of dilutions. 

Under conditions of random sampling, the chance of any plate receiving 0, 1, 2, 3 
protozoa of a given species is given by the Poisson series 



10.364 

364 


MR. R. A. FISHER ON THE MATHEMATICAL 


and in consequence the proportion of sterile plates is 


and of fertile plates 


p = c""*, 

q = 1 —c”“. 


In general we may consider a dilution series with dilution factor a so that 


log ^ = - 


Vl 

a*’ 


and assume that s plates are poured from each clilution. 

The object of the method being to estimate the number n from a record of the sterile 
and fertile plates, we have 

L = S, (logp)+Sj(log j) 


when Si stands for summation over the sterile plates, and Sa for summation over those 
which are fertile. 

Now 


dp _ dq 

d log d log n 


= p log p, 


so that the optimum value of n is obtained from the equation, 


aL 


- = S. (log p) -82 log p) = 0 . 


0 log i 

Differentiating a second time, 

now the mean number of sterile plates is ps, and of fertile plates qs, so that the mean 

1 r a^L . 
value 01 

0 (log ?ir 


- p-p log p(log p+l + ^lug />)[ = ~sS i^(log jo)4, 

O' log A I ■ j 

the summation, S, being extended over all the dilutions. 

It thus appears that each plate observed adds to the weight of the determination 
of log n a quantity 

£ (log p)'\ 



FOUNDATIONS OF THEORETICAL STATISTICS. 


10.365 

365 


We give below a table of the values of and of w, for the dilution series log = 2“' 
from X = —4 to a; = 11. 


X. 

V- 

UK 

S (w) (per cent.) 

-4 

0- 00000011Z54. 

0-OOOOZ4 

O-OOl 

-3 

0-0003354626 

0-021477 

0-906 

-2 

0-01831564 

0*298518 

13-485 

-1 

0-1353353 

0-626071 

39-865 

0 

0-3678794 

0-581977 

64-387 

1 

0-6065307 

0-38.5374 

80-625 

2 

0-7788008 

0-220051 

89-897 

3 

0-8824069 

0-117350 

94-842 

4 

0-9394131 

0-060567 

97-394 

5 

0-960233Z 

0-030764 

98-690 

6 

0-9844964- 

0-01.5503 

99-343 

7 

0-9922179 

0-007782 

99-671 

8 

0-9961014 

0-lK);i899 

99-836 

9 

0-9980488 

0-001951 

99-918 

10 

0-9990239 

0-000976 

99-959 

U 

0-9995118 

0-000488 

99-979 

Remainder 

Total . . 


0-000488 

2-3732fo5 



For the same dilution constant tlie total S (w) is nearly independent of the particular 

series chosen. Its average value being—;-, or in this case 2-373138. The fourth 

6 log a 

column shows the total weight attained at any stage, expressed as a percentage of that 
obtained from an infinite series of dilutions. It will be seen that a set of eight dilutions 
comprise all but about 2 per cent, of the weight. With a loss of efficiency of only 2 to 
2 j per cent., therefore, the number of dilutions which give information as to a particular 
species may be confined to eight. To this number must be added a number depending 
on the range which it is desired to explore. Thus to explore a range from 100 to 100,000 
per gramme (about 10 octaves) we should require 10 more dilutions, making 18 in all, 
while to explore a range of a millionfold, or about 20 octaves, 28 dilutions would be 
needed. 

In practice it would be exceedingly laborious to calculate the optimum value of n for 
each series observed (of which 38 are made daily). On the advice of the statistical 
department, therefore, Mr. Cutler adopted the plan of counting the total number of 
sterile plates, and taking the value of n which on the average would give that number. 

When a sufficient number of dilutions are made, log n is dimmished by i log a for each 

additional sterile plate, and even near the ends of the series the appropriate values of 
n may easily be tabulated. Since this method of estimation is of wide application, 
and appears at first sight to be a very rough one, it is important to calculate its efficiency. 






10.366 

3G6 


MR. R. A. FISHER ON THE MATHEMATICAL 


For any dilution the variance in the number of sterile plates is 


spq, 

and as the several dilutions represent independent samples, the total variance is 


hence 


sS (pg), 

= <^^S{pg). 


Now S {pq) has an average value » therefore taking a = 2, 


and 


(log af = -480453, 
S (pq) = 1 


being very nearly constant and within a small fraction of unity ; whence the efficiency 
of the method of counting the sterile plates is 


-rn-r = 8771 per cent., 

V log 2 ^ 

a remarkably high efficiency, considering the simplicity of the method, the efficiency 
being independent of the dilution ratio. 


13. Summary. 

During the rapid development of practical statistics in the past few decades, the 
theoretical foundations of the subject have been involved in great obscurity. Adequate 
distinction has seldom been drawn between the sample recorded and the hypothetical 
population from which it is regarded as drawn. This obscurity is centred in the so-called 
“ inverse ” methods. 

On the bases that the purpose of the statistical reduction of data is to obtain statistics 
which shall contain as much as possible, ideally the whole, of the relevant information 
contained in the sample, and that the function of Theoretical Statistics is to show how 
such adequate statistics may be calculated, and how much and of what kind is the 
information contained in them, an attempt is made to formulate distinctly the types 
of problems which arise in statistical practice. 

Of these, problems of Specification are found to be dominated by considerations which 
may change rapidly during the progress of Statistical Science. In problems of Distri¬ 
bution relatively little progress has hitherto been made, these problems still affording 
a field for valuable enquiry by highly trained mathematicians. The principal purpose 
of this paper is to put forward a general solution of problems of Estimation. 



10.367 

FOUNDATIONS OF THEORETICAL STATISTICS. 367 

Of the criteria used in problems of Estimation only the criterion of Consistency has 
hitherto been widely applied; in Section 5 are given examples of the adequate and 
inadequate application of this criterion. The criterion of Efficiency is shown to be a 
special but important case of the criterion of Sufficiency, which latter requires that the 
whole of the relevant information supplied by a sample shall be contained in the statistics 
calculated. 

In order to make clear the nature of the general method of satisfying the criterion 
of Sufficiency, which is here put forward, it has been thought necessary to reconsider 
Bayes’ problem in the light of the more recent criticisms to which the idea of “ inverse 
probability ” has been exposed. The conclusion is drawn that two radically distinct 
concepts, both of importance in influencing our judgment, have been confused under 
the single name of probability. It is proposed to use the term likelihood to designate 
the state of our information with respect to the parameters of liypothetical populations, 
and it is shown that the quantitative measure of likelihood does not obey the mathe¬ 
matical laws of probability. 

A proof is given in Section 7 that the criterion of Sufficiency is satisfied by that set 
of values for the parameters of which the likelihood is a maximum, and that the same 
function may be used to calculate the efficiency of any other statistics, or, in other 
words, the percentage of the total available information which is made use of by such 
statistics. 

This quantitative treatment of the information supplied by a sample is illustrated by 
an investigation of the efficiency of the method of moments in fitting the Pearsonian 
curves of Type III. 

Section 9 treats of the location and scaling of Error Curves in general, and contains 
definitions and illustrations of the intrinsic a^xuracy, and of the centre of location of such 
curves. 

In Section 10 the efficiency of the method of moments in fitting the general Pearsonian 
curves is tested and discussed. High efficiency is only found in the neighbourhood of 
the normal point. The two causes of failure of the method of moments in locating these 
curves are discussed and illustrated. The special cause is discovered for the high 
efficiency of the third and fourth momenta in the neighbourhood of the normal point. 

It is to be understood that the low efficiency of the moments of a sample in estimating 
the form of these curves docs not at all diminish the value of the notation of moments as 
a means of the comparative specification of the form of such curves as have finite moment 
coefficients. 

Section 12 illustrates the application of the method of maximum likelihood to dis¬ 
continuous distributions. The Poisson series is shown to be sufficiently fitted by the 
mean. In the case of grouped normal data, the Sheppard correction of the crude 
moments is shown to have a very high efficiency, as compared to recent attempts to 
improve such fits by making x* a minimum ; the reason being that x* is an expression 
only approximate to a true value derivable from likelihood. As a final illustration of 



10.368 

368 MR. R. A. FISHER ON THE MATHEMATICAL FOUNDATIONS, ETC. 

the scope of the new process, the theory of the estimation of micro-organisms by the 
dilution method is investigated. 

Finally it is a pleasure to thank Miss W. A. Mackenzie, for her valuable assistance 
in the preparation of the diagrams. 


References. 

(1) K. Pearson (1920). “ The Fundamental Problem of Practical Statistics,” ‘ Biom.,* xiii., pp. 1-16. 

(2) F. Y. Edgeworth (1921). ” Molecular Statistics,” ‘ J.R.8.S.,” Ixxxiv., p. 83. 

(3) G. U. Yule (1912). “ On the Methods of Measuring Association between two Attributes.” 

‘ J.R.S.S.,’ Ixxv., p. 687. 

(4) Student (1908). “ The Probable Error of a Mean,” ‘ Biom.,” vi., p. 1. 

(5) R. A. Fisher (1915). ** Frequency Distribution of the Values of the Correlation Coefficient in 
Samples from an Indefinitely Large Population,” ^ Biom.,” x., 507. 

(6) R. A. Fisher (1921). “ On the ‘ Probable Error ’ of a Coefficient of Correlation deduced from a 
Small Sample,” ‘ Matron.,” i., pt. iv., p. 82. 

(7) R. A. Fisher (1920). ” A Mathematical Examination of the Methods of Determining the Accuracy 
of an Observation by the Mean Error and by the Mean Square Error,” ‘ Monthly Notices of R.A.S., 
Ixxx., 758. 

(8) E. Pairman and K. Pearson (1919). “ On Corrections for the Moment Coefficients of Limit<Ml 
Range Frequency Distributions when there are finite or infinite Ordinates and any Slopes at the 
Terminals of the Range,” ‘ Biom.,’ xii., p. 231. 

(9) R. A. Fisher (1912). ” On an Absolute Criterion for Fitting Frequency Curves,” * Messenger of 
Mathematics,’ xli., p. 155. 

(10) Bayes (1763). ” An Essay towards Solving a Problem in the Doctrine of Chances,” * Phil. Trans.,’ 

liii., p. 370. 

(11) K. Pearson (1919). ” Tracts for Computers. No. 1 : Tables of the Digamma and Trigamma 

Functions,” By E. Pairman, Carab. Univ. Press. 

(12) K. Smith (1916). ” On the ‘ best ’ Values of the Constants in Frequency Distributions,” * Biom., 
xi., p. 262. 

(13) K. Pearson (1914). ** Tables for Statisticians and Biometricians,” Camb. Univ. Press. 

(14) G. Vega (1764). ” Thesaurus Logarithmorum Gompletus,” p. 643. 

(16) K. Pearson (1900). ” On the Criterion that a given System of Deviations from the Probable in 
the case of a Correlated System of Variables is such that it can be reasonably supposed to have 
arisen from Random Sampling,” * Phil. Mag.,’ 1., p. 157. 

(16) K. Pearson and L. N. G. Filon (1898). ” Mathematical Contributions to the Theory of Evolution. 
IV.—On the Probable Errors of Frequency Constants, and on the influence of Random Selection on 
Variation and Correlation,” * Phil. Trans.,” cxci., p. 229. 

(17) R. A. Fisher (1922). “ The Interpretation of x'** from Contingency Tables, and the Calculation 
of P,” ‘ J.R.S.S.,’ Ixxxv., pp. 87-94. 

(18) K. Pearson (1915). ” On the General Theory of Multiple Contingency, with special reference to 
Partial Contingency,” ‘ Biom.,” xi., p. 146. 

(19) K. Pearson (1903). ” On the Probable Errors of Frequency Constants,” ‘ Biom.,’ ii., p. 273, 

Editorial. 

(20) W. F. Sheppard (1898). “ On the Application of the Theory of Error to Cases of Normal Distribu¬ 
tion and Correlations,” ‘ Phil. Trans.,’ A., cxcii., p. 101. 

(21) J. M. Keynes (1921). “ A Treatise on Probability,” Macmillan A Co., London. 



11.699a 


I I 


THEORY OF STATISTICAL ESTIMATION**^ 


AUTHOR’S NOTE 

In this paper an explanation of the theory of estimation is attempted, 
more compact and businesslike than was possible in 1022 (Paper 10). 
In particular, clarity has been gained by distinguishing the theory 
of large samples, in which all estimates are normally distributed, and 
all efficient estimates are equivalent, more emphatically from the 
theory of small samples, in which it is necessary to distinguish bc^ 
tween different methods of estimation, all of which are efficient in 
large samples, and which moreover in finite samples give curves of 
sampling error of various forms, for which the variance is of no 
special interest. 

The transition is effected by recognising that the limiting form in 
large samples of the distribution of efficient statistics supplies an in¬ 
trinsic measure of the amount of information, as to any unknown 
parameter, supplied by data of the kind investigated, and that this 
measure is applicable equally to the original data, and to any pro¬ 
posed estimates calculated from them. A simple proof is given on 
page 11,717 that no gain in intrinsic accuracy can accrue from any 
process of statistical reduction, so that if there is no loss the maximum 
precision has been obtained. The method gives also the simple con¬ 
dition for zero loss leading both to the special case of sufficient sta¬ 
tistics, and to the possibility of diminishing or obviating such loss 
altogether, by the use of ancillary information. 

From this point of view the paper is a study of the conditions in 
which information is lost. In an important class of cases it is shown 
that, by using the method of maximum likelihood, the loss is less 
than by other efficient methods, and tends to a finite value in large 
samples. Moreover, for large samples a series of ancillary values is 
indicated successively reducing the limiting loss of information to 
quantities of the order of • • *, where N is the number of 

observations, or, in general, a measure of the extent of the observa¬ 
tional record. As these quantities are the successive derivatives at 


♦ Reprinted from Proceedings of the Cambridge Philosophical Society^ Vol. XXII, 
Pt. 6, pp. 700-725, 1925. 



11.699b 


its maximum of the likelihood function, the inference is approached, 
though it seems not to be drawn in this paper, that exhaustive esti¬ 
mation is achieved by a set of statistics which together are capable 
of Bi>ecifying the entire course of the likelihood function. 

A correction has been made in the analysis on pages 11.721 and 
11.722, in the expression for the additional loss of information using 
efficient statistics, which, unlike that derived from maximising the 
likelihood, are not linear in the frequencies. The correct formulae 
are somewhat simpler than those originally given. 



700 


11.700 


Mr Fiaher, Theory of atatiatical eatimaiion 


Theory of Statiatical Estimation. By Mr R. A. Fisher, Gonville 
and Caius College. 


PREFATORY NOTE. 

It has been pointed out to me that some of the statistical ideas employed 
in the following investigation have never received a strictly logical definition 
and analysis. The idea of a frequency curve, for example, evidently implies 
an infinite hypothetical population distributed in a definite manner; but 
equally eviaently the idea ot an infinite hypothetical population requires a 
more precise logical specification than is contained in that phrase. The same 
may be said of the intimately connected idea of random sampling. These 
ideas have grown up in the minds of practical statisticians and lie at the basis 
especially of recent work; there can be no question of their pragmatic value. 
It was no part of my original intention to deal with the logical bases of these 
ideas, but some comments which Dr Burnside has kindly made have convinced 
me that it may be desirable to set out for criticism the manner in which I 
believe the logical foundations of these ideas may be established. 

The i dea of an infinite hyp oth etica l popula tion is, I believe, implicit in all 
statementsinvolving mathematical prdbabinfy.~Tr, in a Mendelian experiment, 
wo say that the probability is one half that a mouse born of a certain mating 
shall be white, wo must conceive of our mouse as one of an infinite population of 
mice which might have been produced by that mating. The population must 
bo infinite for in sampling from a finite population the fact of one mouse being 
white would affect the probability of others being white, and this is not the 
hypothesis which we wish to consider; moreover, the probability may not 
always be a rational number. Being infinite the population is clearly hypo¬ 
thetical, for not only must the actual number produced by any parents bo 
finite, but we might wish to consider the possibility that the probability' should 
depend on the age of the parents, or their nutritional conditions. We can, 
however, imagine an unlimited number of mice produced upon the conditions 
of our experiment, that is, by similar parents, of the same age, in the same 
environment. The proportion of white mice in this imaginary population 
appears to be the actual meaning to be assigned to our statement of proba¬ 
bility. B riefly, the hypothetic al population i s the conceptual resultant of the 
conditions wliieh we are studying 'J’he_p^baBTnfy7' Wko Gilher sTanitical 
parameters, Ts a numerical charactenstic ofthat'pbpulafidn. 

We'only“heed the conception of an infinite hypothetical population, in 
connection with random sampling. The ultimate logical elucidation of the 
one idea implies that of the other. Also, the word infinite is to be taken in its 
proper mathematical sense as denoting the limiting conditions approached by 
increasing a finite number indefinitely. I imagine that an exact meaning can 
be given to all the ideas required by some process such as the following. 

Imagine a population of N individuals belonging to s classes, the number 
in class k being This population can be arranged in order in N I ways. 

Lict it bo so arranged and let us call the first n individuals in each arrangement 
a sample of n. Neglecting the order within the sample, these samples can be 
classified into the several possible types of sample according to the number 
of individuals of each class which ap|>ear. Let this be done, and denote the 
proportion of samples which belong to type^’ by gj , the number of types being /. 
Consider the following proposition. 

Given any series of proper fractions ..., P^, such that S = 



11,701 


Mr Fishery Theory of statistical estimation 701 

and any series of positive numbers 77t» however small, it is possible 

to find a series of proper fractions Q*, and a series of positive 

numbers cj, .... «,, and an integer JVo, such that, if 

and I pj^ - I for all values of k, 

then will 1 S'/ - I < *7i tor all values of j. 

I imagine it possible to provide a rigorous proof of this proposition, but 
I do not propose to do so. If it be true, we may evidently speaJc without 
ambiguity or lack of precision of an infinite population characterised by the 
proper fractions, P, in relation to the random sampling distribution of samples 
of a finite size n. 

It will be noticed that I provide no definition of a random sample, and it 
is not necessary to do so. Wnat we have to deal with in all cases is a random 
sampling distribution of samples, and it is only as a typical member of such 
a distribution that a random sample is ever considered. 

When in 1921 the author put forward in the Phil. Trans, a 
paper (i)*** on mathematical statistics he was principally concerned, 
in respect of problems of estimation, with the practical importance 
of making estimates of high efficiency, i.e. of using statistics 
which enujody a large proportion of the relevant information 
available in the data, and which ignore, or reject along with the 
irrelevant information, only a small proportion of that which is 
relevant. Many of the properties of efficient statistics, such as that 
even moderate inefficiency of estimation will vitiate tests of 
goodness of fit, were at that time unknown, and the further 
discrimination among statistics within the efficient group, a 
discrimination which is essential to the advance of the theory 
of small samples, was left in much obscurity. Further work along 
the lines of the 1921 paper has, however, cleared up the main 
outstanding difficulties, and seems to make possible a theory of 
statistical estimation with some approach to logical completeness. 

1. The problem of estimation. 

Any body of numerical observations, or qualitative data thrown 
into a numerical form as frequencies, may be interpreted as a 
random sample of some infinite hypothetical population of possible 
values. Problems of estimation arise when we know, or are willing 
to assume, the form of the frequency distribution of the population, 
as a mathematical function involving one or more unknown 
parameters, and wish to estimate the values of these parameters 
by means of the observational record available. A statistic may 
be defined as a function of the observations designed as an estimate 
of any such parameter. The primary qualifications of satisfactory 
statistics may most readily be seen by their behaviour when 
derived from large samples. 

A statistic, to be of any service, must tend to some fixed value 

* See numbered list of references on p. 725. 



11.702 


702 Mr Fisher, Theory of steUisticdl estinuUUm 

849 the number in the sample is increased; more precisely if T be 
any statistic calculated from a sample of n observations, there 
must be a limiting value Too such that if e be any positive number 
however small, the frequency (or probability) with which | T — Too I 
exceeds c, tends to zero as n tends to infinity. One example will 
sufiice to illustrate the class of statistics which fail to fulfil this 
condition. 

If the frequency with which the variate x falls into the range 
dx, be given by 

dx 

where m is the unknown parameter representing the centre of the 
symmetrical frequency curve of x, then it is not difficult to show 
that the arithmetic mean of any number of independent values of 
X, will be distributed in exactly the same distribution as a single 
value of X. If the observational material consisted of 1000 values 
of X, we should be able from it to estimate the value of m with 
some precision; but if we were to replace the actual observations 
by their mean, our action would be equivalent to discarding 999 
of the observations, and retaining one of them chosen at random. 
Clearly the mean is a useless statistic for our purpose in that it 
does not tend to a fixed value as the size of the sample is increased. 

2. Consistent statistics. 

When T tends to a limiting value T*,, the latter will be some 
determinate function of the unknown parameters. If, therefore, 
T is to be used for purposes of estimation, it must be equated to 
one particular parameter, or function of the parameters, and if it 
is equated to some other function its use will be inconsistent, 
though perhaps approximately accurate. A statistic is said to be 
a consistent estimate of any parameter, if when calculated from 
an indefinitely large sample it tends to be accurately equal to that 
parameter. The criterion of consistency has been widely used in 
the development of statistical methods, and too often it has been 
the only criterion employed. For example the “method of moments” 
consists merely in evaluating a number of arbitrarily chosen 
statistics, and equating as many of them as may be necessary to 
the corresponding series of parametric functions. Estimates of the 
parameters may be obtained from these equations, but they are 
often estimates of little value. In the example given above we 
have shown how little value has the mean, the first moment, in 
locating a particular curve, one of the Pearsonian types, in fitting 
which the method of moments has been so extensively used. 

In a special group of cases the criterion of consistency is 
adequate alone to give a complete solution. If the number of 



11.703 


Mr Fisher, Theory of statistical csHmalion 703 

frequency classes is only one greater than the number of adjustable 
parameters, then for each parameter there is only one consistent 
statistic, and this of course is the one which must be used. Generally, 
however, there are a great number of possible statistics available, 
all of them consistent, but by no means all of equal value. 

3. Efficient statistics. 

In a large and important class of consistent statistics the 
random sampling distribution tends to the normal (Gaussian) form 
as the size of the sample is increased, and in such a way that the 
variance (the square of the standard deviation) falls off inversely 
to the size of the sample. In such cases the characteristics of any 
particular statistic, for large samples, are completely specified by 
(i) its bias, and (ii) its variance. The question of bias is only of 
preliminary interest; if 6 is the mean value of T — Too, then for 
consistent statistics h must tend to zero with increasing samples. 
If we wish to make tests of significance for the deviation of T from 
some hypothetical value, then h must fall off more rapidly than n ” *; 
if, finally, we wish to use mean values of T from a number of finite 
samples, then h must be actually zero, or at least a small quantity 
of an order determined by the number of such samples to be used. 
In any case a knowledge of the exact form of the distribution of 
T will enable us to eliminate any disadvantages from which a 
statistic might seem to suffer by reason of bias. 

Such knowledge is, however, of no avail to repair the defects 
of a statistic in respect of variance. The criterion of efficiency 
requires that the fixed value to which the variance of a statistic 
(of the class of which we are speaking) multiplied by n, tends, shall 
be as small as possible. An efficient statistic is one for which this 
criterion is satisfied. If we know the variance of any efificient 
statistic and that of any other statistic under discussion, then the 
efficiency of the latter may be calculated from the ratio of the two 
values. The efiGlciency of a statistic represents the fraction, of the 
relevant information available, actually utilised, in large samples, 
by the statistic in question. 

For example, in estimating the value of the standard deviation 
of a normal distribution from a sample of n values, two methods 
have frequently been employed. If 



where S stands for summation over the sample, is an estimate 
of the true value a, based on the method of the mean error. It 
has been shown (2) that the mean value of in random samples is 





11.704 


704 Mr Fisher, Theory of statistical estimation 


while the variance of is 

“ ~n« ^ (i + V'«(»-2) - n + 

If, also, is given by the equation of the mean square error 
ns^ = /S (aj — 5;)* 
then the mean value of is 



(where x ! is used as equivalent to F (a; + i), whether x is an 
integer or not); while the variance of is 



The latter happens to be an efficient statistic, and for large samples 
the variance reduces to while for large samples the variance of 
Si reduces to 


O'* 

2w 


(TT - 2). 


Evidently is not an efficient statistic, but has an efficiency of 
nearly 88 per cent. From a body of 800 observations it will derive 
an estimate of about the same value, as that obtained by from 
700 observations. That is to say the behaviour of the two statistics 
for large samples indicates that about one-eighth of the information 
available is rejected if .Sj is used, while if is employed the whole 
is retained. An exact knowledge of the distribution of would 
not enable us to recover the lost information, for if the sample 
were increased without limit, and consequently the distribution 
brought infinitely near to the normal form, nevertheless the 
fraction of information lost tends to a fixed value. 


4. Properties of efficient statistics. 

Some simple properties of efficient statistics may be derived 
directly from their definition, e.g. their correlational properties (8). 
The correlation between any two statistics, both efficient estimates 
of the same parameter, tends to -h 1 as the sample is increased. 



11.705 


Mr FUher^ Theory of statistical estimation 705 


For if A and B he two such statistics, let the variance of each be 

<7* 

— and the correlation between them be r; also let 

C^i(A + B), 

then C will be a statistic providing a consistent estimate of the 
same parameter, but the variance of O is 


n 



and this by hypothesis must not be less than — , therefore r 

cannot be less than -f 1; but r cannot be greater than + 1; therefore 

r ==s + !• 

For large samples therefore all efficient statistics are equivalent, 
and if in practical work we were only concerned with infinitely 
large samples, the theory of estimation would not require develop¬ 
ment beyond the stipulation that statistics should be efficient. 

If .d is an efficient statistic, and B is an inefficient estimate of 
the same parameter with efficiency equal to B, then in large 
samples the correlation between A and B tends to a limit r=+\/B, 
For if from them we compound a new statistic C, such that 


(1 + ^ - 2rVB) C - (1 ~ rVB) A + (BrVB) B, 


then C will be an estimate of the same parameter with variance 
ter 1 — r* 

n'(i -7*7+ (r - VW* 

and if r does not tend to the limiting value -\-VB, this will ^ 
less than the variance of A^ which is impossible; hence r — + y/E, 
It should be noted that in making a new statistic with variance 
as low as that of A^ when r — -\-VE, the above equation for C 
degenerates into C ~ A, In other words if we have an efficient 
statistic and an inefficient estimate of the same parameter, the 
best use we can make of these two values, at any rate with large 
samples, will be to ignore the latter entirely. Any compound of 
the two will be less efficient than A, 

For example, if a quantity x be normally distributed with 
variance then it is well known that the mean of a sample of 
n is also distributed normally with variance a*/n. The mean in 
this case is an efficient statistic; the median is a second statistic 
which may be used to locate the curve. If 


^(o)= s/\ 

then if a is the central value of a sample of n values (n being odd) 




11.706 


706 Mr Fisher, Theory of sUUistical estimaiion 


it appears that the probability that a shall fall into the range 
da is 


n\ 1 


~ ^ 1^ V27r^ 




n-l 



may be replaced by 


When n is large, ^ ) in this expression must be small, and 


a /2 

<T V TT * . 


and the factor involving ^ by 


(n- Do* 


so that the variance of a for large samples multiplied by n tends 
to the limit 



The efficiency of the median in locating the normal curve is 
therefore 

2 

E — - 63*66 per cent.; 

TT 

from this value may be deduced the correlation, in large samples, 
between the mean and the median derived from the same sample 

r ^ VE - -7979. 

The median thus utilises about 64 per cent, of the information 
provided by the sample, its correlation with the mean of the same 
sample is about *8, but any value obtained by combining the values 
of the median and the mean will result in an estimate inferior to 
that given by the mean. 

A further consequence of this relation between the efficiency 
of a statistic and its correlation with any efficient statistic, is that 
if A be any efficient statistic, and B any inefficient statistic, then 

c&ixJtidiucnv A tMUKtAit, t<r '^e 

may thus divide the error of B into two parts, which in large 
samples at least will be independent; the first part is equal to 
the error in A, and is the error of random sampling properly so 
called; the second B — A is not properly speaking an error of 
random sampling, but an error of estimation. It is the property 
of efficient statistics that, when applied to large samples, they 
shall have no errors of estimation of order comparable with the 
errors of random sampling. 




11.707 


Mr Fishert Theory of sUUisiicdl estimation 707 

In all tests of significance an observed deviation is compared 
with the random sampling variation to be anticipated; in tests of 

g oodness of fit in particular the expectation*’ from which the 
eviations are measured is usually the product of a process of 
estimation, the basis of which is the actual sample of observations 
with which the expectation is to be compared. If, therefore, the 
process of estimation employed involves errors of the same order 
as the errors of random sampling the test of goodness of fit will 
be vitiated; the apparent discrepancy between observation and 
hypothesis will in fact involve errors of estimation of the same order 
as the errors of random sampling to which it is to be compared. 
The effects of such errors upon tests of goodness of fit have been 
shown in more detail in (3). 

5. Derivation of efficient statistics. 

To discover the efficiency of any statistic it is necessary that 
we should have found at least one statistic efficient for the estima¬ 
tion of the same parameter, and should know the variance in 
large samples of the latter. We shall see that the method of maxi¬ 
mum likelihood will always provide a statistic which, if normally 
distributed in large samples with variance falling off inversely to 
the sample number, will be an efficient statistic. The variance in 
large samples of such solutions may be obtained directly from the 
equations by which they were obtained (i). 

For example, if we have a number of observations drawn from 
a population, of which the distribution is given by 

dx 

^ “^*1 + (x-m)*’ 

and wish from the observations to obtain an estimate of the value 
of m, we may write down in terms of m the actu&l probability of 
such a sample as ours occurring. This probability will be 

•n-^ d'x^dx^ ... dxn {1 + (xj — {1 + (Xj — ... 

{I -h (x„ - m)*)-!. 

The likelihood of any value of m, in relation to such a sample, 
is defined as a quantity, of which the maximum value is unity, and 
which shall be proportional to the above probability. It is therefore 
independent of the elements dx^ ... dx„ which enter into the 
probability, but which do not involve m. Likelihood in this sense 
is not a synonym for probability, and is a quantity which does 
not obey the laws of probability; it is a property of the values of 
the parameters, which can be determined from the observations 
without antecedent knowledge. An exact knowledge of the likeli¬ 
hood of different values of m tells us nothing whatever about the 
probability that m will fall in any given range. 



708 


11.708 


Mr Fieher*, Theory of statistical estimation 


If we write for simplicity, 

S(\o^df)^L. 


then ^ = 0 

cm 

is the equation of maximum likelihood, the solution of which 
gives an estimate of m, which we shall write m. In the example 
before us this reduces to 


c f 2 (x - »0 ) 

® il + (x-m)4 - 


(x — w)2j 

To find the value of the variance of m derived from a large 
sample, it is only necessary to differentiate a second time; 


d^L 

dm® 


S 


2 (a; - m)®- 2 
(1 + (a; m)®}® 


)• 


and for large samples the value of the right-hand side divided by n, 
tends to the limiting value — J. If F (m) is the variance of m, we 
therefore equate 

n _ — 1 
^ F(w) 


or 


V (ni) 


2 

n ‘ 


Knowing this value it is easy to determine the efficiency of any 
other proposed statistic; in particular, since the equations of 
maximum likelihood do not always lend themselves to direct 
solution, it is of importance that, starting with an inefficient 
estimate, we can, by a single process of approximation, obtain an 
efficient estimate. 

For example, if the median of the above distribution, be 
chosen as starting point, it is easy to show that the variance of 
in large samples is 

TT® 

4w 


so that its efficiency is S/tt®. The median will differ from the 
maximum likelihood solution by errors of estimation, of which the 
variance will be 

7r®-8 

4w 


It is sufficient for our purpose that the error of estimation is 
of the order w If now we evaluate 


S 


2{x- m^) \ 
1 -I- {X - mj®} 



11.709 


Mr Fisher, Theory of stcUistical estimation 700 


from the observations, and calculate a new estimate m, from the 
equation 

2 (a; — m^) 


„ „ . 2 „ f 2 (a: - m,) ) 


it is easy to see that the error of estimation of m 2 will be of the 
order and therefore that will be an efficient statistic. 


6. Intrinsic accuracy of error curves. 

The variance of efficient statistics from a distribution of any 
form affords us a measure of an important property of the distri> 
bution itself. The fact that from a large sample of n it is possible 
to obtain an estimate of the value of a parameter with variance 
2/n, shows that regarded as an error curve the above distribution 
is intrinsically of the same accuracy as, for example, a normal error 
curve with variance 2. We may thus obtain a measure of the 
intrinsic accuracy of an error curve, and so compare together 
curves of entirely different form. If the variance of an efficient 
estimate derived from a large sample of n is ^/n, then the intrinsic 
accuracy of the distribution is de^ed as l/.d. 

If a frequency curve is defined by 

df = ydx 

where ^ is a function of a parameter 6, then the intrinsic accuracy 
of the curve, as a means of estimating d, is 

r 0 

over the whole range of possible values. Since 
0* . dhj 1 /0y\* 

y (log S') ^ gp - y (gg j - 

while the integral of the first term over all possible values must 
vanish, the intrinsic accuracy may equally be written 



over all possible values; in this form it is clearly seen to be 
necessarily positive. 

What we have spoken of as the intrinsic accuracy of an error 
curve may equally be conceived as the amount of information in 
a single observation belonging to such a distribution. If for 
instance two independent observations were available from the 
same or different distributions, the distribution of the 'pair of 
values would be 


df = yy* dx dx\ 



11.710 


710 Mr Fiaher, Theory of statistical estimation 

and the intrinsic accuracy of such a pair would be 

~ {w ^ w 

which, with the identities, 

J ydx = I y'dx' = 1 , 

reduces to - jy log y. da; — J y' log y'. dx '; 

the amount of information provided by a combination of two or 
more independent observations is thus merely the sum of the 
amounts of information in each piece separately. 

It is a common case for a sample of n observations to be distri¬ 
buted into a finite number of classes, the numbers “expected” in 
each class being functions of one or more unknown parameters, 
if 'p is the probability of an observation falling into any one class, 
the amount of information in the sample is 

where m — np, is the expectation in any one class. The variance 
of an efficient statistic derived from a large sample may, of course, 
be calculated from this expression. 


7 . Efficiency of the maximum likelihood solution. 

We shall now prove that when an efficient statistic as defined 
above exists one may be found by the method of maximum 
likelihood. 

If / stand for the probability that any particular type of 
observation should occur, and ^ for the probability that any 
particular type of sample should occur, then 

log ^ - C + ^ (log/) 

when C is a constant which does not involve the parameters, the 
summation extending over all observations. 

As regards the variation of ^ with varying 0 , it is to be noted that 


1 02 
n 002 


log^=is 



will tend to a fixed limit for large samples, 
limiting value. Then since 



Set (—A) for this 



11.711 


Mr Fisher, Theory of statistical estimalion 711 

when 0 and d is the solution of the equations of maximum 
likelihood, it follows that 

^log ^ ^ — nA {6 — $) 

if ^ — S is a small quantity of order n“*l. 

Now if T is any statistic used as an estimate of 0, the proba¬ 
bility, <b, of T having any assigned value, will be the sum of the 
probabilities of those samples which yield the said value T, that is 

O = ^ 

when S stands for summation over all the samples which yield the 
same value for the statistic T. Also, if T is in large samples 
normally distributed with variance a®, 

V27r 

0 ® 1 

whence g^log<I» = -^. 


The problem of making a® as small as possible, is the problem 
of so grouping the several sorts of samples under the same values 
of the estimate T, that the second difEerential coefficient of log O 
shall be a negative quantity as large as possible. Now 



0 ^® 


log C> = 




s (4,") 
S{<f>) 


and S (^') -i- iS (^) is the mean value, within the group, of 
-- fiA (0 — 8), while S (^") -i- S (^) is the mean value of 


-nA + n*A* (0 - $)•. 


consequently 


nA»V' (0) 


when V' (8) is the variance of 8 within the group. If T = 0, then 8 
will be constant within the group, so that the variance of $ in 
random samples will be 1/nA. For any statistic T which has the 
same value in sets of samples for which the variance of 0 is of 
order n~^ the value of 1/na® will be reduced, for the variance 
within the group is necessarily a positive quantity, and conse¬ 
quently the variance of any such statistic will be greater than 
that of 8, 


8. Efficiency of weighting. 

The effects of the familiar process of weighting observations 
may be well shown in terms of efficiency. If tc; is the weight of a 
normally distributed observation x, so that its variance is Ijw, and 




11.712 


712 Mr Fisher^ Theory of staiiatical estimation 


if a number of such observations be combined with false weights 
w*, then the variance of the weighted mean will be 

(w'Y^ \wj' 

and, when w' — w, this reduces to 1 JS {w), the minimum value. 
The efficiency is therefore 

(w^) 


S 


The loss of weight is 


M S (~) 


S (w) 


(m>') 


If now the inaccuracy in weighting is due to a slight variation 
in w, and we have chosen weights w' equal to the mean values 
of w, then 

W ^ W' + €, 

s(’J) = SK)-S(«) + sg)--> 

S» (W-) S = S (W) + S (*) - S (J,) 
whence the mean loss of weight is, approximately 

when V (w) is the variance of w. 


+ 


9. Small samples', Svjfficient statistics. 

It is now possible to approach the more general problem of the 
estimation of statistics from finite samples, when the distributions 
of the statistics considered will not generally be of the normal form, 
nor will the errors of random sampling be small quantities. The 
different possible efficient statistics will no longer be equivalent, 
and it will be necessary to discriminate among them. In previous 
work on this subject (i) two circumstances seemed to point to 
fruitful lines of development. In the first place attention was called 
to a class of statistics possessing very remarkable properties, which 
contain in themselves the whole of the relevant information 
available in the data. These statistics were therefore distinguished 
by the term sufficient. In the second place it was suggested that 
the idea of intrinsic accuracy might be applied to the random 
sampling distributions of statistics when these were not normal, so 
as to afford a method of comparing their relative values. 



11.713 


Mr Fisher^ Theory of ataiietical eatimaiion 713 

As an example of a sufficient statistic consider the mean of the 
Poisson Series. A variate, confined to whole numbers, is distributed 
in a Poisson Series if the probability of its taking any particular 
value, 2 C, is 



The parameter m may be estimated from the mean of the 
observed sample. This is evidently the solution of the equation of 
maximum likelihood. If x is the mean of a sample of w, the distri¬ 
bution of nx may readily be proved to be given by the Poisson 
Series 

{nx )! 

Now the probability of drawing in order any particular sample 
..., is 


and this may be divided into two factors, 

(nx) 1 *n **®Xi \ x^l ... Xf^\* 

of which the first represents the probability that the actual total 
nx should have been scored, and the second the probability, given 
this total, that the partition of it among the n observations should 
be that actually observed. In the latter factor, w, the parameter 
sought, does not appear. Now when the mean is known any 
furwier information which the sample has to give must depend on 
the observed partition; but the probability of any particular 
partition is wholly independent of the value of m. Consequently 
no statistic calculated from the sample can give any information 
whatever respecting the value of m, beyond that supplied by the 
value of the mean. 

In general, if 0 is any parameter, a statistic sufficient in 
estimating that parameter, and any other statistic, the sampling 
distribution of simultaneous values of and must be such that 
for any given value of the distribution of T, does not involve 6. 

This will evidently be the case, if / (0, Ti, T 2 ) dT^ dT^ be the 
probability that and T, should fall in the ranges dT^y and if 

/(<?, r,) = ^ (fl, 2’,).^'(r„ r,). 

If this condition is fulfilled for all possible statistics T^y then 
will be a sufficient statistic. 

When a sufficient statistic exists it is equivalent, for all sub¬ 
sequent purposes of estimation, to the original data from which it 





11.714 


714 Mr Fisher, Theory of statistical estimation 

was derived. For example the mean of a normal sample is a 
sufficient statistic, and the mean possesses the property that it can 
be combined with the mean of a second sample from the same 
population to find the mean of the combined sample. If/ , ..., x^) 
be for any distribution a sufficient estimate of some parameter, then 

f(x^, ..., xj •••> 

this circumstance much limits the functions which can possibly 
be sufficient statistics. 

For example, the function 

I log S (c**) - ^ log n, 

is a function which might, for the right distribution, be a sufficient 
statistic. As k is made to increase without limit the above function 
tends to be simply the greatest value observed in the sample; just 
as the mean of a number of means is the mean of the aggregate, 
so the greatest of a series of greatest observations, will be the 
greatest of the aggregate. 

When sufficient statistics exist it has been shown that they will 
be solutions of the equations of maximum likelihood. 

10. Intrinsic accuracy of error curves of statistics. 

The fact that sufficient statistics do not always exist renders it 
necessary to explore the possibilities of comparing statistics by 
means of the intrinsic accuracy of their random sampling distri¬ 
butions. 

We may, in fact, give an extended meaning to the word 
efficiency by the definition 

The efficiency of a statistic is the ratio of the intrinsic accuracy of 
its random sampling distribution to the amount of information in the 
data from which it has been derived. 

This definition is in accordance with the definition previously 
given of efficiency for the case of large samples with normally 

distributed statistics. For if — is the variance in large samples of 

an efficient statistic, the intrinsic accuracy of the original distri¬ 
bution will be 1/a®, and the whole information in the data will be 
n/a*. Moreover if in large samples any statistic has variance 
o^jEn, its intrinsic accuracy will be j&w/a®, and its efficiency, by 
either definition, will be E. 

The extended definition has the advantage of applying to finite 
samples and to other cases where the distribution is not normal. 
As an example of the calculation of the efficiency of statistics 



11.716 


Mr Fisher, Theory of sUUisUcal estimation 715 

derived from finite samples, consider the median of an odd number, 
2s + 1, of observations the error curve of which is given by 

It is easy to see that the frequency distribution of the median 
will be given by 


(2s + 1)1 /TT* 




dx 

+ (x — m)* * 


where tan 5 — x — w, and B lies between ± Jw. 

To find the intrinsic accuracy of the distribution, we differentiate 
the logarithm of the above expression with respect to the imknown 
parameter, m; then since 

= — cos*d 

atn 


we differentiate 


— cos*d 


we have 


2s0 cos* B 


+ sin 2d. 


The intrinsic accuracy will be the average value of the square 
of this quantity, or 


(s 


j * ^2s0 cos* 9 +(-- sin 2^1* - 0*)* * d0. 


The definite integral may be expressed in terms of the Bessel 
fimctions of tt and 27r, in the form 


I - - *1^' • ©”• 


The Bessel functions are easily evaluated, for — 0, for both 
values of the argument; while the values of J| are \/2/7r and 
— 1 /-TT, the others being thence obtainable by the recurrence formula 

J _ 7 7 

*'n+l — ~ «'n — •'n-1* 

Thus for s = 2, we find the intrinsic accuracy of the median of five 
observations to be 

1 16 856 

2 77 * 3^* 





11.716 


716 Mr Fisher t Theory of statistical estimation 


The following table shows the numerical values. 


Number in 
sample 


Accura^cy 

Increase for two 
observations 

Kfficiency 

1 

3 

0 

1 

‘60000 

1*09064 

*59064 

•66488 

•69490 

•72121 

*73945 

100*00% 

72*71 % 

5 

2 

1*74662 

69-82 % 

7 

3 

2*44042 

69*73 % 

9 

4 

3*16164 

70*26 % 

n 

5 

3*90109 

70*93 


The efficiency appears to have a minimum between 5 and 7 
observations, and is not approaching its limiting value, 81*06 per 
cent., very rapidly. This is perhaps to be anticipated since for large 
samples it falls short of its limiting efficiency by l*19/.s*, and the 
discrepancy in the above table is considerably less than this. 

Statistics which are efficient for large samples may, of course, 
have comparatively low efficiencies for finite samples, and in 
certain cases the efficiency may tend to its limiting value so slowly 
that even samples of over 100 are not very efficiently treated. 
The median, for example, is an efficient statistic for locating the 
centre of the double exponential curve, 

(If — dx, 

when the sample is increased without limit, but owing to the 
discontinuity at the apex, its efficiency approaches its limiting 
value somewhat slowly. The intrinsic accuracy of the original 
distribution is unity, and that of the median of {2s + 1) observa¬ 
tions may be shown to be 

• (6’+ ])(2,S +1) f (2^*)! ) 

s-l ( 2“-'(«!)“!■ 

The numerical values are: 


Number in 
sample 

n 

Accuracy 

ElHciency % 

Loss of 
information 

1 

0 

] 

100 

0 

3 

1 

2-3178 

77-26 

*6822 

5 

2 

3-7600 

75*00 

1*2500 

7 

3 

6*2500 

75-CX) 

1*7500 

9 

4^ 

6-7969 

75*62 

2-2031 

19 

9 

14-940 

78-63 

4-060 

33 

16 

26*932 

81-61 

6*068 

51 

25 

42-844 

1 84*01 

8*156 

73 

36 

62*709 

85*90 

10*291 

99 

49 

86*544 

87*42 

12*466 

129 

64 

114-36 

88-65 

14*64 

163 

81 

146*16 

89*67 

16*84 

201 

100 

181*95 

90*52 

19*06 






11.717 


Mr Fisher, Theory of sUUisiiccLl estimation 717 

The case is unusual in that the loss of information does not 
tend to a fixed limit, but increases ultimately as 4 V«/ir —> 4; the 
cause of this exceptional behaviour lies in the fact that at the 
apex of the curve the second differential coefficient with respect 
to the unknown, m, is infinite. The example stresses the importance 
of investigating the actual behaviour of statistics from finite 
samples, instead of relying wholly upon their calculated behaviour 
in infinitely large samples. 

We can now prove in general that the efficiency can never 
exceed unity, and derive the condition that there shall be no loss 
of information. 

If the probability that any statistic, T, should take a particular 
value is O, then the intrinsic accuracy of the distribution of T is 



the summation being taken over all possible values. If now every 
possible sample, having frequency <f> gave a different value of T, 
then the intrmsic accuracy would be 

and would be independent of T, If, however, a number of different 
samples give the same value of T, then the effect of this amalga^ 
mation will be to decrease the intrinsic accuracy by the amount 



This quantity is never negative, so that the intrinsic accuracy 
of T can never be greater than when every possible sample yields 
a different value of T. This is obvious because, in such a case, the 
actual sample can be reconstructed without ambiguity from the 
value of T, and so the value of T, which is merely a kind of short¬ 
hand statement of the original sample, must contain the whole 
of the information provided by the sample. 

The condition that there shall be no loss of information when 
different samples give the same value of T is that the sets of 
possible samples for which T is constant shall be those for which 

4>~ de 



718 


Mr Fisher, Theory of statistical estimation 




is constant. If these sets are the same for al] values of $, then the 
equation of maximum likelihood 


will provide a sufficient statistic. 

For if this is the case dLjdO depends, apart from 6, only on the 
set to which the sample belongs; in other words it is a function 
of 0 and S only. Thuiflf / is the frequency with which any sample, 
or group of samples having the same u, occurs, then 

now let the frequency of samples such that 6 lies in the range d$ 
and a second statistic, T, lies in the range dT, hef{0, 0, T) d0dT, 
then since the above equation will be true for all values of 0, twe 
shall integrate it with respect to 0 and obtain J 

log/ = 6) de + c 

where C does not involve 0, and is a function therefore of 0 and T 
only. Hence / is of the form 

<l>(0j).<(>' i0,T) 

A 

whatever statistic may be taken as T, and so 0 must be a sufficient 
statistic. 


11. Minimal loss of accuracy. 

When the sets of samples which for one value of 0 have the 
same value of dLld0, have no longer the same value for other 
values of 0, there exists no sufficient statistic, and some loss of 
information will necessarily ensue upon the substitution of a single 
estimate for the original data upon which it was based. 

The extent of this loss, in large samples, for which presumably 
it will be greatest, may now be calculated. If the sample consist 
of observed numbers x^, ... Xg in categories in which the expecta¬ 
tions are Wj, ..., trig, then 

L -- S (x log ?n)y 



11.719 


Mr Fisher, Theory of statistical estimation 719 


if now dLjdB = 0, then to a first approximation 




d*L 


A 

and the variance of dLjdB in a set of samples for which B is constant, 
will be given by the variance of d^LjdB* within the set multiplied 
by (B — By, or the total loss of information will be given by the 
general variance within such sets multiplied by F (a). 

Now the random sampling distribution of the values of x will 
be the multinomial distribution, and the simplest method of 
regarding this distribution is to consider each value of x indepen¬ 
dently distributed in a Poisson Series about a mean value m; the 
whole being subject to the restriction that 


iSi (x) =» iS (m) = n. 


In such a system any quantity S (kx) is easily seen to have a 
mean value S {km ); its variance, if there were no restriction, would 
be S and in introducing any linear restriction we have only 

to remove that portion of the variance produced by varying in 
the prohibited direction. Two restrictions are here necessary, for 
the first 

S (x) = n 


we have to deduct {km) -r- n. 

The second restriction arises from the fact that we require the 
variance within the groups for which dLjdB is constant, since 
dL 


dB 




the deduction will be {km') 



Writing 



and remembering that V {B) = \jS )» we have for the loss of 
information in large samples. 



The method of calculation by differences has the advantage 
that if, by the estimation of other parameters, further restrictions 
upon variation are introduced, we may choose such parameters 



11.720 


720 Mr Fisher ^ Theory of statistical estimation 


that their maximum likelihood estimates will be in large samples 
uncorrelated with each other and with and the whole efiect of 
the restrictions will be further to reduce the loss of accuracy by 
terms of the form 



without further examination of the restrictions. 


12. Example of loss of intrinsic accuracy, 

1 dx 


In the curve 


we may write 




77-1 

n dx 

-h (IT - 


then for the determination of d, 


.S 


_ n r 
\ m ) TT j _ 


f » 


” r 

{ m 

m } J 

TT J -oo 

^m' 

' - 

- 0. 

m ;j 



^t^dt n 

> (in*)® "" 2 * 

... 

(1 + 8 


7m 


The loss of information is therefore 


and since the intrinsic accuracy of the original distribution is J, 
the loss on statistical reduction is equivalent in large samples to 
observations. For small samples the loss will presumably be 
less since it vanishes for samples of one. 

In the location of the centre of this curve, therefore, we see 
that the mean is a statistic which throws away the greater part of 
the information available; the median is an inefficient statistic 
which makes use of a fraction, approaching in large samples the 
limit S/tt® of the information. The solution of the equation of 
maximum likelihood, like other efficient statistics makes use of all 
but an amount which tends to a finite limit as the sample is 
increased. The amount lost may differ in different efficient 
statistics; it will be least for the solution of the equation of maxi¬ 
mum likelihood. 



11.721 


Mr Fisher, Theory of stcUistical estimaiion 721 


13. Loss of accurckcy with other statistics. 

With efficient statistics other than the maximum likelihood 
solution, the loss of accuracy will be somewhat greater, though 
still tending to a finite limit for large samples. The variance of 
dLjdd for sets of samples yielding the same statistic will be due 
to two independent causes. The first depends upon the sampling 
error, T — 6, and upon the fact that is not the same for 

all samples yielding the same statistic. Since all efficient statistics 
tend to equivalence with increasing samples, this portion will be 
the same for all efficient statistics. The second portion is approxi¬ 
mately independent of the sampling error, and depends upon the 
deviation of T from the maximum likelihood solution 0. 


For example, if x* = ^ ^ 

the equation for minimum x*, 


'35* — m* 0m\ 

, w* dd) 


= 0 , 


gives an efficient estimate of d\ for the maximum likelihood equa¬ 
tion is 

and the ratio ^ tends to the constant value, 2, for large 

Vfl 

samples. Now, since 

(x^ — m®) = 2m (x — m) (x — m)*, 
the deviation in dLjdS will be 

1 e f(® - »»)* 

»»* 5^1 ’ 

and it is the variance of this quantity for samples of equal sampling 
error which gives the loss of information. By the same device as 
before we evaluate the variance of S {k (x — m)*} in the form 


2S {k^rn^) 


or, substituting 


k 


1 m' 

2 m*’ 



we have 



11.722 


722 Mr Fisher^ Theory of etatisticeU eatimcUion 


This quantity remains finite as the siae of the sample is increased 
without limit, but increases without limit as the number of classes 
is increased. Consequently, as one might expect, the method of 
minimising breaks down for fine grouping. 

For example, suppose we have 5 classes only, in a population 
distributed according to the binomial distribution 

(P + g)*- 


Let p be calculated from numbers observed in the 5 possible 
classes in a large sample. Then 

= np\ ni 2 ~ inp^q, etc., 


and the intrinsic accuracy is 


S 



in 

pq‘ 


The loss of information due to using the minimum solution is 
calculated from 



2pg I 37>) 


and is in fact 

This is least when p = q, and is then equal to 

20 

pg' 

or equivalent to the loss of 5 observations. 

In approaching the maximum likelihood solution by successive 
approximations we have seen that starting with an inefiicient 
statistic, a single process of approximation will in ordinary cases 
give an efficient statistic differing from the maximum likelihood 
solution, by a quantity which with increasing samples decreases 
as n~^. The loss of information of such efficient statistics is therefore 
finite for large samples, for the additional variance of dLjdO will be 
jr ,"2 y - 6), 

and L" increases proportionately to the sample. If the process 
of approximation be repeated a statistic is obtained differing from 
the maximum likelihood solution only to the order of and 
for such a statistic the loss of accuracy, beyond that suffered by 
the maximum likelihood solution will tend to zero for large samples. 



11.723 


Mr Fisher, Theory of stcaisticol estimation 723 

The practical prckcedure of fitting will thus not ordinarily require 
more than a second process of approximation. 


14. Theoretical existence of fully accurate statistics. 

In the manner in which we have developed the theory it would 
appear that the loss of information inherent in the process of 
replacing a quantity of observational material by a single value, 
arose from tne circumstance that the groups of samples, which 
ought to give us the same estimate, change with the value of the 
un&iown parameter, and so that no way of grouping can be the 
best for all values of the parameter. The method of maximum 
likelihood takes that way of grouping which is most accurate for 
the particular value which is equal to the estimate arrived at. The 
loss of information with which we are concerned is the difference 
in accuracy between the solution of the equations of maximum 
likelihood, and another statistic which might conceivably be 
arrived at by chance, but which cannot be specified without 
knowledge of the true value of the parameter. 

If, for example, the quantity 0 in the following expression 
happened to be equal to the value of m in the population specified by 

/7/* ^ dx 

^ ~ 7T I (x — m)® ^ 

and we used the equation of estimation 

«{r+'s,.) - >!(■<!•>. 

where it will be noticed that the left-hand side is dLjdO, then 
dLjdB is constant for samples giving the same T ; the form of / can 
be ascertained from the condition of consistency, for the equation 
will only be consistent if 

^ 2 (m ~ g) 

•' ^ ^ TT J 1 -f (x - ^)a*l -^{x- m)a (m - 0)* -f 4* 

The equation for T will therefore be 

) _ 2n{T~B) . 

+ {X - d)»j “ + 4’ 


this equation is nearly equivalent to the equation we have given 
for improving an approximate value, in this case B. If, however, 
B were in fact equal to the true value of the parameter, the statistic 
T would be distributed in random samples with an intrinsic 
accuracy equal to the maximum possible. We cannot, however, 
utilise this fact, for if we could rely upon our putting the right 
value of B into this equation, we could choose B as our estimate 
and so avoid errors of random sampling altogether. 



11.724 


724 Mr Fisher, Theory of siatisticdl estimation 


15. Use of ancillary statistics. 


Since the original data cannot be replaced by a single statistic, 
without loss of accuracy, it is of interest to see what can be done 
by calculating, in addition to our estimate, an ancillary statistic 
which shall be available in combination with our estimate in 
future calculations. 

If our two statistics specify the values of dLjdO and d^LjdW 
for some central value of d, such as 0, then the variance of dLjdd 
over the sets of samples for which both statistics are constant, will 
be that of 


i(0~ 6)^ 


dU. 


which will ordinarily be of order n~^ at least. With the aid of 
such an ancillary statistic the loss of accuracy tends to zero for 
large samples. 

The function of the ancillary statistic is analogous to providing 
a true, in place of an approximate, weight for the value of the 
estimate. If a number of large samples were available, and if 

M = S(L) 

when the summation is taken over all the samples, then M will 
be the logarithm of the likelihood of 6 from sample; 

but necessarily 

and if 6 be the value of 0 for which M' vanishes, and 0^, the value 
for which 1j^ vanishes, then with large samples, when 0 = 0^ 

(0 - 0^) L-. 

Hence 0 is given by the equation 

or 

where = S {U'). 


If we had ignored the ancillary statistic and taken as weights 
the mean value 




vW 


the loss of weight in the combined value, 




11.726 


Mr Fisher^ Theory of statistical estimation 725 

is the sum of the contributions to the loss of weight from the 
several samples. Each contribution is equal to the sampling 
variance of U* multiplied by V ($), and this is just the quantity 
we have found as measuring the loss of information. 


REFERENCES. 

(1) Fisbbb, R. a. (1921). **The mathematioal foundations of theoretical 
statistics.** Phil. Trans, A., vol. 222, pp. 300-368. 

(2) -(1920). A mathematical examination of the methods of determining 

the accuracy of an observation by the mean error and by the mean square 
error.** Momlhly Notices of R.A.a. vol. 80, pp. 768-770. 

(3) -(1924). “The conditions under which x* measures the discrepancy 

between observation and hypothesis.** J.R.S.S. vol. 87, pp. 442-460. 



12.804a 


I 2 

ON A DISTRIBUTION YIELDING THE ERROR 
FUNCTIONS OF SEVERAL WELL KNOWN 
STATISTICS 


AUTHOR’S NOTE 

This paper was written to call attention to the fact that many re¬ 
cently solved problems of distribution involve only a single family of 
distributions, that of x^» 2:, and t. It is concerned to bring out the 
mathematical basis of these, their mutual relationships, and the 
principal applications then known. It is reprinted with minor ad¬ 
justments of notation. 


* Reprinted from Proceedinga of the InUrnational Mathematical Congreee^ To¬ 
ronto, pp. 806-813, 1924. 



12.805 


ON A DISTRIBUTION YIELDING THE ERROR FUNCTIONS OF 
SEVERAL WELL KNOWN STATISTICS 


By Mr. R. A. Fisher, 

Statistical Department, Rothamsted Experimental Station, Harpenden, England. 

1. Theoretical Distributions 

The idea of an error function is usually introduced to students in con¬ 
nection with experimental errors; the normal curve itself is often introduced 
almost as if it had been obtained experimentally, as an error function. The 
student is not usually told that little or nothing is known about experimental 
errors, that it is not improbable that every instrument, and every observer, 
and every possible combination of the two has a different error curve, or that 
the error functions of experimental errors are only remotely related to the error 
functions which are in practical use, because these are applied in practice not 
to single observations but to the means and other statistics derived from a 
number of observations. 

Many statistics tend to be normally distributed as the data from which 
they are calculated are increased indefinitely; and this I suggest is the genuine 
reason for the importance which is universally attached to the normal curve. 
On the other hand some of the most important statistics do not tend to a normal 
distribution, and in many other cases, with small samples, of the size usually 
available, the distribution is far from normal. In these cases tests of Significance 
based upon the calculation of a “probable error” or “standard error” are in¬ 
adequate, and may be very misleading. In addition to tests of significance, 
tests of goodness of fit also require accurate error functions; both types of test 
are constantly required in practical research work; the test of goodness of fit 
may be regarded as a kind of generalized test of significance, and affords an 
a posteriori justification of the error curves employed in other tests. 

Historically, three distributions of importance had been evolved by students 
of the theory of probability before the rise of modern statistics; they are 


Distribution 
Binomial expansion. 

Normal curve. 

Exponential expansion. 


Due to Date 

Bernoulli c. 1700, 

Laplace, Gauss 1783, 

Poisson 1837; 


of these the series of Bernoulli and Poisson, although of great importance, 
especially the latter, are in a different class from the group of distributions with 
which we are concerned, for they give the distribution of frequencies, and are 
consequently discontinuous distributions. 



12.806 


SO0 


R. A. FISHER 


2. Pearson’s x* Distribution 

In 1900 Pearson devised the x* test of goodness of fit. If Xu Xt, .... are 
the observed frequencies in a series of «' classes, and nti, m 2 , . w*- the cor¬ 

responding expectations, then the discrepancy between expectation and obser¬ 
vation may be measured by calculating 

(x-my 

m 

The discrepancy is significant if x® has a value much greater than usually 
occurs, when the discrepancy measured is that between a random sample, and 
the population from which it is drawn. To judge of this we need to know the 
random sampling distribution of x^- This distribution Pearson gave in his 
paper of 1900. The distribution for large samples is not normal; it is independent 
of the actual values of Wi, .... ; but it includes a parameter which, according 

to Pearson’s original exposition, was to be identified with the number of 
frequency classes. Consequently in Pearson’s original table, and in the fuller 
table given soon after by Elderton, the table is entered with the parameter n\ 
which can take all integer values from 2 upwards. 

More recently, it has been shown that Pearson neglected, as small quantities 
of the second order, certain corrections, which in fact do not tend to zero, but 
to a finite value, as the sample is increased. These corrections are, in fact, not 
small at all. In consequence of this, most of the tests of goodness of fit made 
in the last 25 years require revision. The important point is, however, that 
the distributions found by Pearson still hold, if we interpret n', not as the number 
of frequency classes, but as one more than the number of degrees of freedom^ in 
which observation may differ from expectation. For example, in a contingency 
table of r rows and c columns, we ought not to take 


but 


»'-l =(c-l)(r-l) 


in recognition of the fact that the marginal totals of the table of expectation 
have been arrived at by copying down the marginal totals of the table of obser¬ 
vations. For instance in a 3X5 table we should put w' = 9, and not n' —15. 
In a 2X2 table n' = 2, not »' = 4. 

One consequence of this is that it is more convenient to take » = n' —1, 
representing the number of degrees of freedom, as the parameter of the tables; 
in fact, to number the tables from (say) 1 to 50 instead of from 2 to 51. The 
real importance of n is shown by the fact that if we have a number of quantities 
X\,....,Xn, distributed independently in the normal distribution with unit 
standard deviation, and if 

X* = 5(jc*), 

then so defined, will be distributed as is the Pearsonian measure of goodness 
of fit; n is, in fact, the number of independent squares contributing to x*- The 
mean value of x* is equal to w. 



12.807 


ON THE ERROR FUNCTIONS OF SEVERAL WELL KNOWN STATISTICS 807 

The X* distribution is the first of the family of distributions of which 1 will 
speak, and like the others it turns up more or less unexpectedly in the distribu¬ 
tions of a variety of statistics. In a noteworthy paper in 1908, "Student" in¬ 
vestigated the error curve of the Standard Deviation of a small sample from a 
normal distribution, and with remarkable penetration he suggested a form for 
this error curve which has proved to be exact. The relation of this curve with 
that of X* is close; if x stand for any value of a normal sample, x for the mean, 
and AT for the standard deviation of the population, then 

S(x-xy _ „s> 

<r« 

where n, the number of degrees of freedom, is one less than the number in the 
sample, and j* is the best estimate from the sample of the true variance, 

Another example of the occurrence of the x^ distribution crops up in the 
study of small samples from the Poisson Series. In studying the accuracy of 
methods of estimating bacterial populations by the dilution method I was led 
to the fact that the statistic 

X 

when jc is a single observation, and x the mean of the sample, is distributed 
wholly independently of the true density of the population sampled; for ordinary 
values of x, but not for very small values, it is also distributed independently 
of X, and its distribution is that of x* with n one less than the number in the 
sample. 

A similarly distributed index of dispersion may be used for testing the 
variation of samples of the binomial and multinomial series. The case of the 
binomial is interesting to economists, in that it leads at once to a test of the 
significance of the Divergence-Coefficient of Lexis. In fact, the method of 
Lexis was completed, and made capable of exact application, from the time of 
the first publication of the table of x** I do not think, however, that this has 
been observed, either by exponents of the method of Lexis and his successors, 
or by exponents of the test of goodness of fit. 

3. The General z Distribution 

The most direct way of passing from the x® distribution to the more 
general distribution to which I wish to call attention is to consider two samples 
of normal distributions, and how the two estimates of the variance may be com¬ 
pared. We have two estimates Ji* and sz^ derived from the two small samples, 
and we wish to know, for example, if the variances are significantly different. 
If we introduce hypothetical true values <ri* and aa* we could theoretically calcu¬ 
late in terms of <ri and az, how often Si^—Sz^ (or ri —Sz) would exceed its observed 
value. The probability would of course involve the hypothetical <ri and <ra. and 
our formulae could not be applied unless we were willing to substitute the 
observed values 5i‘ and Sz^ for and tra^; but such a substitution, though quite 
legitimate with large samples, for which the errors are small, becomes extremely 



12.808 


808 R. A. FISHER 

misleading for small samples; the probability derived from such a substitution 
would be far from exact. The only exact treatment is to eliminate the unknown 
quantities ai and o-j from the distribution by replacing the distribution of s by 
that of log s, and so deriving the distribution of log Si/s 2 . Whereas the sampling 
errors in are proportional to <ri, the sampling errors of log depend only upon 
the size of the sample from which si was calculated. 

We may now write 

= 0-2* Xa* = 

0-2* ftiS2{x^) 

where 5i and S 2 are the sums of squares of (resiMJctively) »i and independent 

quantities; then z will be distributed about log — as mode, in a distribution 

0^2 

which depends wholly on the integers «i and « 2 . Knowing this distribution we 
can tell at once if an observed value of z is or is not consistent with any hypo¬ 
thetical value of the ratio ai/aa- 

The distribution of z involves the two integers and Wa symmetrically, in 
the sense that if we interchange «i and Wa we change the sign of s. In more 
detail, if P is the probability of exceeding any value, z, then interchanging ni 
and Wa, l—P will be the probability of exceeding —s. 

Values of special interest for Wi and Wa are <» and unity. If Wa is inhnite, 
•^a/wa tends to unity and consequently we have the x* distribution, subject to 
the transformation 

w 

or 

z = ^ log^ ; 
n 

similarly if Wi is infinite, we have the x^ distribution again with 

Zp= —h log — x\-/> 

n 

the curves being now reversed so that the P of one curve corresponds to l—P 
of the other. 

In the second special case, when «i = l, we find a second important series 
of distributions, first found by “Student” in 1908. In discussing the accuracy 
to be ascribed to the mean of a small sample, “Student” took the revolutionary 
step of allowing for the random sampling variation of his estimate of the standard 
error. If the standard error were known with accuracy the deviation of an 
observed value from expectation (say zero), divided by the standard error, 
would be distributed normally with unit standard deviation; but if for the 



12.809 


ON THE ERROR FUNCTIONS OF SEVERAL WELL KNOWN STATISTICS 809 

accurate standard deviation we substitute an estimate based on n degrees of 
freedom we have 

VS(^ 

consequently the distribution of t is given by putting «i=l, and substituting 
2=2 log /*. 

The third special case occurs, when both Wi = 1, and W 2 = , and as is obvious 

from the above formulae, it reduces to the normal distribution with 

2 = 5 log 

In fact, one series of modifications of the normal distribution ^ives the x® distributions, 
a second series of modifications gives the curves found by "^Student'*, while if both 
modifications are applied simultaneously we have the double series of distributions 
appropriate to z. 

Like the distribution, the distribution found by “Student” has many 
more applications beyond that for which it was first introduced. It was intro¬ 
duced to test the significance of a mean of a unique sample; but as its relation to 
the 2 distribution shows, it occurs wherever we have to do with a normally dis¬ 
tributed variate, the standard deviation of which is not known exactly, but is 
estimated independently from deviations amounting to n degrees of freedom. 
For example, in a more complicated form it gives a solution for testing the 
significance of the difference between two means, a test constantly needed in 
experimental work. 

An enormously wide extension of the applications of “Student’s” curves 
flows from the fact that not only means, but the immense class of statistics 
known as regression coefficients may all be treated in the same way; and indeed 
must be treated in the same way if tests of significance are to be made in cases 
where the number of observations is not large. And in many practical cases 
the number is not large; if a meteorologist with 20 years records of a place 
wishes to ask if the observed increase or decrease in rainfall is significant, or if 
in an agricultural experiment carried out for 20 years, one plot has seemed to gain 
in yield compared to a second plot differently treated, Student’s curves provide 
an accurate test, where the ordinary use of standard errors or probable errors 
are altogether misleading. 

The more general distribution of z like its special cases, crops up very 
frequently. I found it first in studying the error functions of the correlation 
coefficient. If the correlation, let us say, betweenbrothers, is obtained 
by forming a symmetrical table, we obtain what is called an intraclass corre¬ 
lation. If r is such a correlation, let 

r=tanh 2 ,Wi=Ar~l, na = <^, 

then this transformation expresses the random sampling distribution of r in 
terms of that of z, when jv is the number in the sample. 



810 


R. A. FISHER 


12.810 


It was the practical advantages of the transformation that appealed to me 
at the time. The distribution of r is very far from normal for all values of the 
correlation, (p); and even for large samples when the correlation is high; its 
accuracy depends greatly upon the true value of p, which is of course unknown, 
and the form of the distribution changes rapidly as p is changed. On the other 
hand the distribution of z is nearly normal even for small values of n; it is abso¬ 
lutely constant in accuracy and form for all values of p, so that if we are not 
satisfied with its normality we can make more accurate calculations. 

The distribution shows a small but constant bias in the value of z, when r is 
derived from the symmetrical table; this bias disappears if instead of starting 
from the correlation as given by the symmetrical table we approach the matter 
from a more fundamental standpoint. Essentially we estimate the value of 
an intraclass correlation by estimating the ratio of two variances, the intraclass 
variance found by comparing of the same class, and the variance 

between the observed means of the different classes. From n classes of ^ ob¬ 
servations each, we have w(j —1) degrees of freedom for the intraclass variance, 
and n — \ degrees of freedom for the variance of the means. From the definition 
of the z distribution it will obviously be reproduced by the errors in the ratio 
of two independent estimates of the variance, and if we estimate the variances 
accurately the bias in the estimate of the correlation will be found to have dis¬ 
appeared; it was, in fact, introduced by the procedure of forming the symmetrical 
table. 

The practical working of cases involving the z distribution can usually be 
shown most simply in the form of an analysis of variance. If x is any value, 
Xp the mean of any class, and x the general mean, n the number of classes of j 
observations each, the following table shows the form of such an analysis: 


Analysis of Variance 

Variance Degrees of Freedom Sum of Squares Mean Square 

Between classes. wi = » —1 

Within classes. n 2 = n{s — \) Sr*{x—Xpy Ja* 

Total. ns-1 Si”\x-xy 


The two columns headed Degrees of Freedom and Sum of Squares must add 
up to the totals shown; the mean squares are obtained by dividing the sums of 
squares by the corresponding degrees of freedom, then 2 = log st/st may be used 
to test the significance of the intraclass correlation, or the significance of its 
deviations from any assigned value. 

4. Multiple Correlations 

The same method may be used to solve the long outstanding problem of 
the significance of the multiple correlation. If y is the dependent variate, and 
xit X 2 t . . Xp are p independent variates, all measured from their means, and if 
the regression of y on Xi, ...» is expressed by the equation 

F —6iXi-f-... -\-bpXp 






12.811 


ON THE ERROR FUNCTIONS OF SEVERAL WELL KNOWN STATISTICS 811 

such that the correlation of y with Y is then R is termed the multiple corre¬ 
lation of y with xi, .. .,Xp, and the ingredients of the analysis of variance are as 
follows: 

Variance Degrees of Freedom Sum of Squares 

Of regression formula. P •5(F*) 

Deviations from regression 

formula. — 1 5(y—F)* 

Total. ^-1 5(y*) 

For samples from uncorrelated material the distribution of R may therefore 
be inferred from that of «, the actual curve forA/'observations and p independent 
variates being 

y v-3 , 

df- - - - 

^2, 

2 ■ 2 ■ 

which degenerates when p^X, into the better known distribution for a single 
independent variate 

2 Ar-4 

^ dR, 

^-4! 

—1 V. 

a distribution first suggested by “Student*’, which has been established since 
1915. 


5. The Correlation Ratio 

The distribution, for uncorrelated material of the correlation ratio 17 , is 
clearly similar to that of the multiple correlation, R, and resembles the case of 
the intraclass correlation when the number of observations varies from class to 
class. 

A number of values of the variate y are observed for each of a series of 
values of the variate x]Np is the number observed in each array, y the mean 
of the observed values and yp the mean of any array; the variance of the 
variate y may be analysed as follows, there being a arrays: 


Variance Degrees of freedom Sum of Squares 

Between arrays. o —1 

Within arrays. SQlp — X) •^(y— 5 >p)* =iV(t- 


Total. S(^p)-\ S(y-y)* 


11 ' is thus distributed just like R*. The transformation is 

l~i|> n. 









812 


R. A. FISHER 


12.812 


If the observations are increased so that —► °°, then 



tends to be distributed in the x* distribution corresponding to (a — l) degrees 
of freedom, while for the multiple correlation with p independent variates 




«2 


l-R^ 


tends to be distributed in the x^ distribution with p degrees of freedom. These 
are two examples of statistics not tending to the normal distribution for large 
samples. 

More important than either of these is the use of the z distribution in testing 
the goodness of fit of regression lines, whether straight or curved. If Y stand 
for the function of x to be tested representing the variation of the mean value of 
y for different values of x, let there be a arrays and let q constants of the re¬ 
gression formulae have been adjusted to fit the data by least squares, then the 
deviations of the observations from the regression line may be analysed as 
follows 


Variance due to Degrees of freedom Sum of Squares 
Deviation of array mean 

from formula. a—q 

Deviation within array. N — a S{y'~ypy - 


Total. N-q 5(y~r)» 

where R is the correlation between y and Y. The transformation this time is 

v^— R^ = ^2® 

l — t}^ Si Tti 


and if the sample is increased indefinitely 


Tti - - 


tends to x* distribution for (a —g) degrees of freedom. 


This result is in striking contrast to a test which is in common use under the 
name of Blakeman’s criterion, which is designed to test the linearity of regression 
lines, and which also uses the quantity 17 * —r*. Our formula shows that, for 
large samples, with linear regression. 


(iV-o) 


l-ri^ 


has a mean value a —2. The failure of Blakeman’s criterion to give even a 
first approximation, lies in the fact that, following some of Pearson’s work, 
the number of the arrays is ignored, whereas, in fact, the number of arrays 
governs the whole distribution. 





ON THE ERROR FUNCTIONS OF SEVERAL WELL KNOWN STATISTICS 813 


Summary 


The four chief cases of the z distribution have the following applications 


I. 

Normal Curve 

II. 

111. 

Student’s 

IV. 

z 

Many statistics from 
large samples 

Goodness of fit of fre¬ 
quencies 

Index of dispersion for 
Poisson and Binomial 
samples 

Variance of normal 
samples 

Mean 

Regression coefficient 
Comparison of means 
and regressions 

Intraclass correlations 
Multiple correlation 
Comparison of vari¬ 
ances 

Correlation ratio 
Goodness of fit of re¬ 
gressions 

COMPARISON OF DISTRIRUTION FORMULAE 


Constant factor 

1 

Frequency 

X* 

1 

.^n-2, 

2. —! 


Normal 

2 

e-^^dx *tf*-»***d2 

1 

Student’s 

2«i« 1 

n+\ 

1 («+/*)” 2 d/ = 


n —2 — 

—! V- 



o «1+«2-2, 

2 

e^i*dz 

z 

1 

« 9 « 9 ‘ * 

2*2' 



Addendum Aug., 1927. 

Since the International Mathematical Congress (Toronto. 1924) the prac¬ 
tical applications of the developments summarized in this paper have been more 
fully illustrated in the author’s book Statistical Methods for Research Workers 
(Oliver and Boyd, Edinburgh, 1925). The Toronto paper supplies in outline 
the matheniatical framework around which the book has been built, for a 
formal statement of which some reviewers would seem to have appreciated the 
need. 



13.352a 


13 

THE MATHEMATICAL DISTRIBUTIONS USED 
IN THE COMMON TESTS OF SIGNIFICANCE 


AUTHOR’S NOTE 

This paper was written to give mathematical teachers a concise ac¬ 
count of these distributions, with demonstrations and the analysis 
of their probability integrals. These are identified in various forms 
with partial sums of the binomial and Poisson series. Papers 12 
and 13 together supply a compact mathematical background for the 
common tests of significance. 


• Reprinted from Econometrica, Vol. 3, No. 4, pp. 353—365, 1935. 



13.353 


THE MATHEMATICAL DISTRIBUTIONS USED IN 
IN THE COMMON TESTS OF SIGNIFICANCE 

By R. A. Fisher 

Introduction. —The three frequency distributions which provide the 
greatest number of tests of significance in common use are all closely 
related. The main types of application will be found illustrated arith¬ 
metically in the author’s book Statistical Methods for Research Workers 
and in other publications in which extensive use is made of the arith¬ 
metical arrangement known as the Analysis of Variance. Some need 
has, however, been felt by mathematicians for a concise account of the 
algebraic properties and relationships of these distributions, and the 
following are essentially lecture notes designed to give a mathematical 
student a clear account of their properties. 

1. The frequency distribution of x®-—If Xi, Xa, * • • , are independ¬ 
ent values of a variate distributed normally about zero, with unit 
variance, then the quantity 

X* = S{x^), 

where S, as usual, stands for summation over the sample, has a dis¬ 
tribution given by:— 

n — 2 
2 


This may be proved in several ways, two of which deserve notice, 
(a) By induction, for w = 1, the expression reduces to 





which is clearly the distribution of x* for a single observation. If, now, 
2u is the sum of the squares of n independent values of the variate, 
and has the distribution, 


df 


1 


n - 2 


2 


! 




and is an additional observation independent of the others, then 

X* ** 2u -h X*, 

and its distribution is to be inferred from the simultaneous distribution 

353 



354 


SCONOMETBICA 


13.354 


dS 


1 

n — 2 

- ! 

2 



2^J<n—2)g—li—. 


If we now substitute 


w == i(x® — a?*), du = d(ix®), 

we have 

1 /2 /x* — x*\ 

^ - ^VT (-^) ■ 

2 


in which a: takes all values from 0 to x* Integration with respect to x 
will, therefore, yield a factor x"~^ or (with a constant which 

need not be determined, but which may be obtained from the Culerian 
integral of the first kind), giving the distribution 

df = - 

n — 1 

- I 

2 


in accordance with the general formula. 

Although the proof by induction is an attractive exercise in Eulerian 
integrals, many students find an alternative proof based on Euclidean 
hyperspace more simple and direct. 

If xi ''' Xn arc the co-ordinates of a point in such space, the fre¬ 
quency density at any point is proportional to and depends only 

on the distance of the sample i>oint from the origin. The region in 
which this density exceeds any specified value is, therefore, a hyper¬ 
sphere in n dimensions having volume proportional to x"* The volume 
in which x hes within any elementary range dx is, therefore, propor¬ 
tional to 

x”~*dx, 

and the element of frequency in this range is proportional to 
The Eulerian integral of the second kind. 


/: 


t^e-*dt = p!, 


then supplies the required constant factor and establishes the distribu¬ 
tion of X or x®-^ 


^ This distribution was first given by Helmert in 18V5; it was later found in¬ 
dependently by Pearson, “Student,** and others, in the examination of vari¬ 
ous special problems belonging to the wide class in which it occurs. 



R. A. FISHER 


355 


2. The dietribuUon of StudenVs /.—If we have a value of x* derived 
from n independent values, and an additional value z independent of 
the others, “Student's may be defined as 


t 


xy/n 

X 


for n degrees of freedom. Writing down the simultaneous distribution 
of X and X, as above, and substituting for x in terms of t, we obtain 

df - ^i /— • dt; 

n 2 ^ r x y/n 

2 ‘ 


or, putting u for Jx* 
1 


n — 2 ^ \/n' 


(-T> 






Integration with respect to u from 0 to infinity, recollecting that t 
must be equally frequently positive or negative, yields the distribution 
found by “Student" in 1908:— 





dt 



i(n+l) 


3. The distribution of z .—In the most general case arising in the 
analysis of variance, we consider two quantities, xi®i and X 2 *, based 
respectively on wi and na values of the variate, all of which are inde¬ 
pendent. We may then define z so that 

, waxi* 

ss U ses - 3 - , 

niX2* 

and proceed to find the distribution of z. This is carried out, as in the 
previous cases, by writing down a simultaneous distribution of xi 
and X 2 and making the substitution 



356 


ECONOMETRICA 


13.356 


ni 

Xi* = — X2®W- 
712 

The integration proceeds as before, yielding the general distribution 
for z, 

wi + nj — 2 ^ 

2 ‘ 

df = 2 --» 

rii ~ 2 ^ na — 2 ^ (nie** + 

2 ■ 2 ‘ 


which is evidently that of the natural logarithm of the ratio between 
two independent estimates of the same standard deviation based on 
rii and n 2 degrees of freedom, respectively. The wide class of prob¬ 
lem for which z provides the appropriate test of significance is most 
easily recognized from this property. 

4. The Probability integral of jp. The probability integral of the x* 
distribution, 

n ~ 2 


/ , 00 I 

— 2 ^ 
2 


which represents the probability of exceeding a given value of x** 
Now, integrating by parts, 



i'e~‘dt 


[- vr - X .-/, 


1 


^X^ (r - 1)! 


t^~^e~*dt 


= -7 (.hx^ye->’‘' 

r\ 



1 


tr-^e~^dt. 


When n is even, this process terminates, yielding the formula 

P = e-*«> + ix^ + ^ (ix“)" + •• •+-(ix’) -SfS-) 

Thus, for 


n = 2, 
n = 4, 


p = 

P = + ix"), 



13.357 


367 


R. A. FIBHRR 

n - 6, P = e-***!! + ix* +-^ (ix‘)»|, 

n = 8, P = + ix* + ^ (ix») + ~ (lx*)*| , 

all of which are easily calculated for a given value of Jx*- 

When n is odd, the same process may be applied, terminating at 
r = ^; we then have the formula, 

P = r - - \ (§x*)^ -h 

•'»»*(->)! <41 ^ 

* 2 

+-^7- (ix») 

:lz1, ^ 

2 ' 

In the integral, write }a:* for /, and substitute for the fractional fac¬ 
torials using ( — i) j ~ Vw-| and we find 


P = /j/-^ + |/^e-»«*|x + 4x’ + ~ x* 

-j- ■ ■ *-• 

3-5 • • • (n - 2) f 

The integral is the familiar probability integral of the normal curve, 
the contribution to P being the total frequency outside the limits ± x 
times the standard deviation. The series is easily evaluated as before. 

5. Relation of the x^ distribution to the Poisson series .—It will be 
noticed that, when n is even, the probability of the variate Jx^ ex¬ 
ceeding any specified value m is 



which is the sum of the first Jn terms of the Poisson series, having the 
parameter m, or, in other words, the probability that a variate dis¬ 
tributed in such a series is less than ^n. This identity is expressed 
in the formula. 


/. 


» 1 

— Pe~'*dt 
p! 






358 


ECONOMETRICA 


18.358 


where p, which takes the place of §(71 —2), is a positive integer or zero. 

Thus, a table of can be used as a table of the partial sum of the 
Poisson series; in particular, the 5 per cent value of x®, which is the 
value exceeded once in 20 trials, gives (on halving) the value of m, 
the ‘‘expectation” of the Poisson series of which the first \n terms oc¬ 
cupy 5 per cent of the frequency. 

For example, if n is 8, the 5 per cent value of x* is 15.507; conse¬ 
quently, we may infer that, if a rare event has been observed only 
3[=l(n—2)] times, the observation has departed significantly from 
any expectation exceeding 7.754 occurrences and, consequently, its 
real frequency of occurrence probably does not exceed that which 
would give this number in our body of observations. Again, if n is 6, 
the 95 per cent point is 1.635, so that, if 3 cases have certainly been ob¬ 
served, the expectation probably exceeds 0.817, since for this value 
95 per cent of the observed numbers will be 0,1, or 2. We may thus use 
the table very simply to show just how much information about the 
frequency of rare events is contained in a record of only a few such 
occurrences. 

6. The prohahlity integral of **SiudenVs” t distribution .—It has al¬ 
ready been shown that the ratio ( of a deviation to its standard error 
as estimated from n degrees of freedom is 

12 V-J(n+l) 

1-h —) dl; 
n / 


or, if tan d is written for t/\/n, 



df = 


n — 1 
-! 



-cos"“* O dd. 

!V^ 


Then the probability of exceeding a given value of t is 



n — 1 

-I 

2 



COS"“^ 


Odd, 


where / = \/n tan a. 



13.359 


R. A. flBHER 


359 


Now, integrating by parts, it appears that 


r** (r±^) i 

r!y/w 


cos**^* S dB 


• , cos**- ^ sin ^ 

L rlVir Ja 


iw (r + J)! 

2 --rr— 7 = cos**-~* ff sin* 0dB. 


(r- l)!Vx 
which, so long as r is positive, is equal to 


(r + i)l 


«' (r - i)! 


ri-x/x 

Hence, 

•*' (r-fi)! 


(r 

— 
« (r — 

cos d • dd. 


(r - l)lV7r 


ede 


j a r 


1 (r-J)! 

—=— co8*'+‘ 8-d6 -;=- cob’' a sin a 

IVX 2 rlVx 

•' (r-J)'. 


when r is positive; but, when r = 0, 
(r-f i)I 


e>de 


f 


r!\/n 


cos**^* 9'dB J(1 — sin a) . 


Hence, 


J — J sin 
1-3-5 • • • (n - 3) 


13 

1 4" i cos <x H-cos* a • • • 

2-4 


2-4 6 


(n--2) 


cos’ 




when n is even. When n is odd, we proceed in the same way until r ~ J, 
and obtain 


ot sin a ( 

P — i - \ cos ot i cos* a 

IT IT t 

(n - 3) _ _ 


4- 


2-4 
3 5 


(n - 2) 


r 


As in the case of x*» when n is odd a transcendental function is re¬ 
quired, in this case an inverse circular function, whereas when n is 
even, P is expressed as a function algebraic in t. 



360 


ECONOMETRICA 


13.360 


“Student** has given (1) four-figure tables of P up to n = 20; beyond 
this value a good asymptotic expansion is available (3). “Student's** 
tables are for 1 — P in the notation used above, and represent the prob¬ 
ability of a value less than any given positive value of t. Since the 
distribution is symmetrical about zero, this probability is never less 
than For tests of significance, we often require the probability, 2P, 
that the observed ratio of a deviation to its estimated standard error 
shall lie outside the limits ±t, or the complementary probability, 
1 — 2P, that it shall lie within these limits. 

It will help to make clear the analogy with the more general test of 
significance given by z, of which the and i tests are special cases, to 
observe that, when n is even, the expansion for 1 — 2P is 


sin a 


{ 


1 •+• i cos^ ot -1- 


13 

-cos* «-!-••* 

2 4 


+ 


1 - 3 • 5 • •• n - 3 
^4 6 • • n - 2 


cos’*"*® 



or, in terms of- = cos* « =* g, 

n -h 



1- 3 

1 + + —7 + • • • -f 

2- 4 


13 5 • • n - 3 

2-4 6 -71-2 


9 *<-’>}. 


in which the expression within the bracket is the first in terms of the 
binomial expansion 

(1 - 9)-^. 

Just as the probability integral of x* gives the partial sum of a 
Poisson series, so, therefore, does the probability integral of t give the 
partial sum of a special type of binomial expansion; in each case the 
external factor is the inverse of the sum of the complete series, and the 
identity holds for all even values of n. 

7. The prohahiliiy integral of the distribution of z .—The distribution 
of z involves two whole numbers, n\ and na, which are the numbers of 
degrees of freedom in the two lines of the analysis of variance to be 
compared, and is given by the general formula, 

ni 4- n2 — 2 ^ 

2 e^i^dz 

df — 2 -;- 

ni — 2 712 — 2 (/iic** H- 


2 


2 



13.361 


R. A. FI8HRB 


361 


Writing 

this becomes 

df 

Now, 

(r + 5 + 1)1 


nic** 


nie** -4- n% 


ni + n* — 2 , 


ni — 2 nt — 2 

^ -f _I- y 


2)(1 — gr) . 


/: 


— xydx 


+ 


risl 

(r 4- « 4- 1)! 


-[- 


(r 4 s 4 1)! 


(j 

J, (r- 


1)!(5 4 1)! 
(r 4 « 4 1)1 


r!(8 4 1)! 
x*'~^(l *— x)*-^^dx 


x'(l — x)* 


r!(r 4 l)t 

• ^ (r 4 « 4 1)! 


(r 

J a (r - 


(r- l)!(s4 1)! 


^.(1 __ ^).+i 

xydx 


r f‘(r4«4 1)! 

— - I- x^i\ — x)''c?x. 

s 4 1 ris! 


This establishes the recurrence relation 


/. 


1 (r 4 s 4 1)! 


4 


rlsl 

• 1 (r 4 «) I 


x»^(l — xydx — 


(r 4a)l 


rlsl 


g’'(l — qy+^ 


(r 4 

J o (r- 


■“ xydx. 


(r - 1)!5! 

Hence, when ni is even, the probability of 

ni 6 ** 

wie** 4 n* 

exceeding any fractional value q is 


P * (1 - g)*"« 


( na na(ns 4 2) 

|i + Yg+ 


2-4 


nt(ni + 2) • • • (wi + ni — 4) 
2-4 • • • (n, - 2) ' 




(A) 



13.362 


362 


ECONOMETRICA 


of which the terms in the second component are the first \ni terms of 
the binomial expansion 

(1 - 

From this expansion, when m is even and not too large, the probability 
may be easily calculated. 

Alternatively, using the direct result of integration by parts, we 
shall find the alternative expression 


(1 - + 


ni + n* — 2 


+ 


2 1 - <7 

(ni 4- na — 2)(ni -h na — 4) 

2 li 

(ni 4- ni — 2) 


( r ^) + • • • 

(n. + 2) ^ q y(■.-»! 


2-4 • ■ (m - 2) \1 - q 

involving the first Jni terms of the expansion of the positive binomial 

2 > 

9 


(B) 




The probability integral of z, when ni is even, is thus equivalent to 
the sum of \ni terms of a negative binomial in form (A), or of a positive 
binomial in form (B). It is of some historical interest that the proba¬ 
bility integral of the normal distribution was first introduced by De 
Moivre as an approximation to the sum of a terminating series of 
binomial terms. Indeed, had the eighteenth century mathematicians 
possessed greater analytic power, the distribution of z, which was un¬ 
known to statisticians up to about 10 years ago, might have been 
studied before the normal distribution. 

If 712 tends to infinity and n^q to the limiting value x*» both the forms 
(A) and (B) tend to the form 


e-***(l + ix" + (ix*)’ 

I 


Til 


2 


! 


(ix‘)‘<--«) , 

i 


which we obtained for the distribution of x®» if we identify n\ of the 
general case with n of the x® distribution. The distribution of x* is, 
thus, as is obvious from its statistical derivation, the limiting case, 
when 712 is infinite, of the general distribution, the substitution being 


n = n\. 



13.363 


363 


R. A. FISHER 

Again, if ns«l, the expression is evidently equivalent to that ob¬ 
tained for the probability that ^'Student’s” test of significance t shall 
lie within the limits 

± 

but 



q ni ni 


hence the probability that z shall exceed a given value is the proba¬ 
bility that t shall lie within the limits ±6“*, when 1, n\ — n. 

Since z is the logarithm of the ratio of the estimates of the standard 
deviation derived respectively from ni and na degrees of freedom, it 
follows that, if we interchange ni and wa and change the sign of z^ the 
expression for the distribution is the same as before. Consequently, 
when na is even, the probability integral may be expressed as the sum 
of ina terms of a binomial expansion. The expression corresponding to 
(A) is, writing p for 1 — 

1 ~ P « (1 

+ 

illustrating that, for ni = l, n% = n, the probability that z exceeds any 
given value is the probability that t will fall outside the limits ±^*, 
and that, when ni is infinite, the x* distribution is given by the trans¬ 
formation 


( ni 

ni(ni -f- 2) 


niini -h 2) 


2-4 

• ' (ni -f Wa — 4) 


iK** I 


2 4 • • • (n, - 2) 




(AO 



n 


n = Wa. 


Corresponding to expression (B), we have 


1 - P 


2 ) 



fll tl2 — 2 
2 



■4" 


(ni H- na — 2) ■ • • (ni -f- 2) /I — g\ 


• (na — 2) 




(BO 


When both ni and wa are even, it appears that P and 1—P are the 
first ini terms and the remaining ^na terms of the expansion of 

ip "1" < 7 ) *“*0 



13.364 


364 


ECONOMETRICA 


where 


= —c®* 
V n2 


the ratio of the sums of squares in the analysis of variances. 

In cases where either ni or is even, the probability integral can 
be expressed as the incomplete sum of a binomial aeries, either with 
positive or negative exponent, the exponent being the half of an odd 
integer if either wi or is odd. The case remains to be considered in 
which both ui and ?i 2 are odd. 

For this case we apply the recurrence solution as far as r — obtain¬ 
ing 


712 — 1 


r‘_j 

J a nj — 2 


■x-Kl - 




712—1 


- 1 


+ 2 - 


712 — 2 


(1 


l\/ir 




712+1 


+ 


(712 + 1) • • (n2 + 711 — 4) 

3 • •• (Til - 2) ' 


,4(n,-3) j. 


The numerical coefficient of the second term, when 7 x 2 is odd, is 
2 2-4 • • • (7*2 - 1) 

X 1-3 • • • (n, - 2) ' 

and the integral remaining at this stage is just double the one which 
has been already evaluated for the t distribution; so, putting 

X — sin^ 6, q — sin^ a. = -> 

^^ + 712 


we find, since 712 is odd, 

2a 2 sin a. L 

P — 1 -- —- < cos a + 3 cos* a + 

X X i 

, 2-4 • • (712 - 3) ^ \ 

q---cos”*““a> 

3-5 • • ■ (772 - 2) f 


3-5 • • ■ (772 - 2) 



13.365 


R. A. FISHER 


365 


2 2-4 • 
T 1 3 • 
(na -h 1) 


(n, - 1) 


(na - 2) 

• (ni + na — 4) 


sin a cos’** a 


r na 4- 1 
1 4-—::—8in*a4- 


gin"! * a ^ 


3-5 ■ (ni - 2) 

in terms of a, where a is connected with z by the equation 


tan 




Tables for z for the 5 per cent and the 1 per cent points of the dis¬ 
tribution have been given for rii = 1, 2, 3, 4, 5, 6, 8, 12, 24, oo ; the last 
five values are in harmonic progression and enable the table to be in¬ 
terpolated in the manner which I have called asymptotic interpolation. 
For na, I have given values from 1 to 30, together with 60 and qo ; 
in this case again the series of values for 20, 30, 60, and «, may be 
used for asymptotic interpolation and the table thus gives four-figure 
values of «, an accuracy fully sufficient for all common purposes for all 
combinations of ni and na except the region in which ni exceeds 24 and 
na exceeds 30. 

H. A. Fisher 

University Callege, London 


REFERENCES 

1. Helmert (1875), ^'tlber die Berechnung des wahrscheinlichen Fehlers aus 
einer endlichen Anzahl wahrer Beobachtungsfehler,” Zeitschrift fiir Maihe-^ 
matik und Physik. xx, 300-303. 

2. “Student,” New Tables for Testing Significance of Observations (1925), 
Metron v. 105-120. 

3. R. A. Fisher, Expansion of **StudenVs** Integral in Power of n~^ (1925). 

4. A. de Moivre, The Doctrine of Chargees (1756). 



14.653a 


14 

THE GENERAL SAMPLING DISTRIBUTION OF 
THE MULTIPLE CORRELATION 
COEFFICIENT 


AUTHOR’S NOTE 

The introductory section says almost all that need be said about 
this* paper. In 1938 Bose and Roy found that Mahalanobis’ 
statistic, used for measuring the “generalised distance^^ in multi¬ 
variate analysis, was of the form of (C) as defined here. Both (.A) 
and (C) will, I am sure, reappear in other problems, and their tabu¬ 
lation, perhaps in a form comparable with that of the limiting distri¬ 
bution (B) given in this paper, is much to be desired. Since (B) is 
the limiting form as 712 —► of both (A) and (C), the simplest ap¬ 
proach to the problem of tabulation would seem to be to add tablcjs 
like that of (B) for = 144, 36, 16, 9, interpolation among which 
would make the general functions readily accessible for all reason¬ 
ably large values of n^. The distributions for small even values, if 
required, are easily obtained from the elementary cases set out in 
this paper. 


* Reprinted from Proceedings of the Royal Society, Series A, Vol. 121, pp. 664— 
673, 1928. 



14.654 


The General Samjpling Distribution of the Multiple Correlation 

Coefficient, 

By K. A. Fisher, Sc.D., Rothamsted Experimental Station, Harpenden, 

Herts. 


1. Introductory. 

Of the problems of the exact distribution of statistics in common use that 
of the multiple correlation coefficient is the last to have resisted solution. It 
will be seen that the solution introduces an extensive group of distributions, 
occurring naturally in the most diverse types of statistical investigation, and 
which in their mere mathematical structure supply an extension in a new 
direction of the entire system of distributions previously obtained, which in 
their generality underlie the analysis of variance. The individual distributions 
of this system were in each case obtained by the exact investigation of a 
particular problem. It was realised only gradually that many of these distri- 
butiojis, disguised by the different notations appropriate to different problems, 
were in reality equivalent, and could be made available in practice by a single 
system of tables. The remaining cases, with the notable exception of the 
correlation coefficient, then fall into place as particular limiting forms of a 
single general distribution. As the practical utility of these earlier solutions 
depends greatly on a recognition of their place in a single system, a very brief 
account of the ir mutual relationship may be given. 

The only statistic derived from samples of a continuous variate, of which the 
distribution was known before the present century, appears to be the arithmetic 
mean of a sample drawn from the normal distribution. In addition, however, 
two distributions which may be regarded as distributions of statistics had also 
been found, namely, Bernoulli's JUtvermlaib distribution, and Poisson’s series. 
Both of these distributions possess the property that the aggregate of the 
values of a sample is itself distributed in a distribution of the same t 3 ^e. In 
all three classical cases, therefore, the distribution of the statistic derived from 
a finite sample was known only by a mathematical simplification of this special 
type. In all other cases, approximations of unknown accuracy based on the 
use of the standard error and the assumption of normal distributions had 
perforce to be used. 



14.655 

JOistnrihution of Multiple. Co7^ela,tion Coefficient. 655 

In 1908 “ Student attacked the problem of the distribution of the mean 
of a normal sample measured, as in practice it must usually be, in terras of 
the standard error as estimated from the same sample. He was thus inci¬ 
dentally led to the equally fundamental distribution of the variance of a normal 
sample. This latter, to which the general distribution of the analysis of 
variance degenerates when tends to infinity, is in reality equivalent to the 
distribution found by Pearson'f* in 1900 for the measure of discrepancy 
developed for testing goodness of fit. From this “ Student ” was able to 
derive the exact distribution (the distribution of t) of the mean of a unique 
sample, which as subsequently api>eared falls into the same system with 
= 1. The two principal limiting forms of the general distribution were 
thus known in 1908, and were available for practical application by means of 
Elderton’sJ and “ Student’s first tables. 

In 1915 the distribution of the coefficient of correlation was obtained § by a 
use of Euclidean hyper-space similar to that employed below. The same 
method served at the same time to put “ Student’s ” results upon a rigorous 
basis. The distribution of the correlation coeflicient stands outside the 
analysis of variance system, but, as will be seen in the present paper, it is 
brought into coherent cormection with it by the distribution of the coefficient 
of multiple correlation. When, however, the corresponding distribution 
the intraclass correlation was obtained!|, the distribution found was of a new 
and different type, which, as subsequently appeared, was the general distribu¬ 
tion of the analysis of variance, in which the variance is analysed into two 
parts representing that within and that between the classes or “ fraternities ” 
of which the data are composed. This was the first instance of th<^ general 
distribution which from the notation there used is distinguished as the distri¬ 
bution of z. 

The recognition of the fundamental importance of the two parameters, 
and /i 2 » which specify the numbers of degrees of freedom in the two estimates 
of variance to be compared, and the recognition of the distribution of x* as 
equivalent to that of an estimate of variance led, in 1922 and the following two 
years,^ to the demonstration that it is always the number of degrees of freedom 

* “Student,” 1908. “The probable error of a mean,” Biometrika, vol. 6, 
pp. 1—25. 

t K. Pearson, 1900. “On the criterion that a i^iven system of deviations from 
the probable in the case of a correlated system of variables is such that it can be 
reasonably supposed to have arisen from random sampling.” Phil. Mag., vol. 50, 
pp. 157-175. 

t P. Elderton, 1902. “Tables for testing goodness of fit.” Biometrika, vol. 1, 
p. 155. 

§ R. A. Fisher, 1915. “Frequency distribution of the values of the correlation 
coefficient in samples from an indefinitely large population.” Biometrika, vol. 10, 
pp. 507-621. 

\\ R. A. Fisher, 1921. “On the ‘probable error’ of a coefficient of correlation 
deduced from a small sample.” Metron, vol. 1, No. 4, pp. 1—32. 

^ R. A. Fisher, 1922. “On the interpretation of x® from contingency tables, 
and the calculation of P.” J. Roy. Stal. Soc., vol. IjXXXV, pt. I, pp. 87—94. 

R. A. Fisher and A. L. Bowley, 1923. “Statistical tests of agreement between 
observation and hypothesis.” Economica, No. 8, pp. 139-147. R. A. Fisher, 1924. 

“The conditions under which x* measures the discrepancy between observation 
and hypothesis.” J. Roy. Stat. Soc., vol. LXXXVII, pt. HI, pp* 442—449. 



14.656 

656 


R. A. Fisher. 


which is to be used in applying the test of goodness of fit. The further proof 
that the test is only valid when the methods of estimation employed have 
been effidenl^ binds the theory of goodness of fit closely to that of estimation 
in the development of which the exact distribution of statistics play an essential 
part.* 

Meanwhile,a solution of the exact distribution of x* when applied to test 
the goodness of fit of regression formulse had shown that a modification was 
required in this case, which, in fact, involved dropping the approximate assump* 
tion that was infinite, and reduced the general distribution to the same form 
as that already found in the study of intraclass correlation. At the same time, 
the distribution of the correlation ratio, yj, derived from uncorrelated material, 
was shown to belong to the same class with equal to one less than the number 
of arrays ; and the distribution of regression coefficients, whether total or 
partial, and whether employed in a linear or a non-linear formula, were shown 
to conform to “ Student’s ” distribution. The solution of the distribution 
of the correlation ratio tq really included also that of the multiple correlation 
coefficient for samples drawn from uncorrelated material, the distribution of 
which was given in its appropriate notation in 1924.J 

In the same year§ it was found possible to use the representation in hyper¬ 
space to demonstrate that the distribution of the partial correlation coefficients 
is exactly the same as that primarily found for the total correlation, provided 
that unity is deducted from the sample number for each variate eliminated. 

Each distinct type of distribution found has thus occurred repeatedly in 
difierent investigations ; whereas, however, nearly all cases are reducible to 
a common type capable of exact treatment by the same simple arithmetical 
procedure,!! and requiring the same fundamental table, the distribution of the 
correlation coefficient, total or partial, stood aside from the main 
system, and was capable of only an approximate treatment by using the 
distribution of z. 

The distribution of the multiple correlation coefficient, apart from the 
practical necessity of applying to observed results sufficiently exact tests of sig¬ 
nificance, is thus of great theoretical interest owing to the close connection which 
must exist between it and the simple correlation coefficient, on the one hand, 
and, on the other, to the form already obtained from uncorrelated material. 

* R. A. Fisher, 1922. ''On the mathematical foundations of theoretical statis¬ 
tics.*’ PhU. Trana. Roy. Soc. London^ iA), 222, pp. 309-368. 

t R. A. Fisher, 1922. "The goodness of fit of regression formulae and the dis¬ 
tribution of regression coefficients." J. Roy. Slot. Soc., vol. XXXV, pt. IV, 
pp. 697-612. 

t R. A. Fisher, 1924. "The influence of rainfall on the yield of wheat at Roth- 
amsted." PhU. Trans. Roy. Soc. London, (R), vol. 213, pp. 89-142. 

§ R. A. Fisher, 1924. "The distribution of the partial correlation coefficient." 
Metron, vol. 3, pp. 329-332. 

!1 R. A. Filler, 1928. Stalistical Methods for Research Workers. Oliver A 
Boyd, 2nd ed. 



14.657 

Distribution of Multiple Oorrelation Coefficient, 657 

The latter solution involves, besides the variate and frequency, the two 
parameters and ng, and is therefore a functional relation between four 
variables. The new solution necessarily involves also the multiple correlation 
in the population sampled, making a 6fth variable ; complete tabulation of 
the residts would thus require a table of fourfold entry ; even confining 
attention to specified points of special importance, such as the 5 per cent, and 
1 per cent, points, a procedure that has made tabulation practicable for the 
distribution of z, we should still have tables of triple entry. The problem of 
adequate tabulation is certainly not insurmountable, but to ascertain the 
proper method to adopt in its presentation will require further study of the 
nature of the function. The table of the 5 per cent, points of the distribution 
of B (Section 5) should in the author’s opinion provide sufiicient guidance for 
the greater number of practical applications. 


2. Method of Solution, 

The primary problem of the sampling distribution of the correlation co¬ 
efficient between two variates, x and y, was originally solved by interpreting the 
n individual values of either variate appearing in the sample as the co-ordinates 
of a point in Euclidean space of n dimensions. It then easily appeared that 
the correlation coefficient between the variates was the cosine of the angle 
between the two radii vectores drawn from the origin to points, the co-ordinates 
of which represented the deviations from the mean of the sample of the two 
variates concerned. 

The frequency with which r, the observed correlation coefficient, falls in 
any infinitesimal range dr may be usefully thought of as the product of two 
factors, one being the generalised volume in which the second sample point 
may lie so that the correlation may fall within the assigned range, this value 
being independent of the correlational properties of the population sampled, 
while the second is a factor by which the frequency density in any element of 
volume is modified by the correlation between x and y in the population. 
With zero correlation in the population, the frequency density at any point 
depends only on its distance from the origin, and since for any given distance 
the point is free to move over a sphere in (n — 1) dimensions, one dimension 
having been eliminated by using the sample mean as origin, it is easy to see 
that for this case the frequency distribution of r is given by 


df = 


[ J (n — 4)] ! y/n 


(1 — dr. 



14.658 

658 


R. A. Fisher. 


The general solution of the primary problem* 

rf/ = n—J. (1 — p»)» <"-») (1 — r*)»'»-«) f* 
7C Jo 

may be written with advantage 

(!-»*)* *• 

Li (« — 4-)] ! V" 

^ [i(«-2)]! _ _ 

[J (w — 3)] ! ' 


(cosh z — pr)" 


p2)S(..-l) j' 


_^_ 

, (cosh z — ’ 


The second factor then represents the effect upon the frequency density, in 
the region represented by dr, of a correlation p in the sampled population: 
the numerical part of this factor is merely such as to reduce it to unity when 

p = 0. 

With multiple correlations we are concerned with the correlations between 
a dependent variate y, and a number of independent variates, ajj, a^g, 
and, moreover, with the correlations of the latter among themselves. It was 
not at first obvious that the sampling distribution did not involve this whole 
matrix of correlations, in which case, even if it could be determined, it would 
be of no practical use. The argument, by which it can be seen to depend from 
only a single parameter of the population, is therefore of special interest, 
as by its general character it is applicable to a number of statistical problems, 
and leads in this case directly to the solution. 

The multiple correlation of y with Xj, a^g, • • • , is the correlation between y 
and that linear function of x^, Xg, ... , sc», with which its correlation is highest. 
If, therefore, for the dependent variates, x, we substitute an equal number of 
new variates, defined as linear functions of the variates, x, then the 
multiple correlation in the population, and in every sample from it, will remain 
unchanged. In particular we may choose as that linear function the corre¬ 
lation of which with yin the population sampled is highest, and for the remaining 
variates, we can choose linear functions of x, uncorrelated with or with 
each other. In choosing the last of these we have no more than — 1 con¬ 
ditions to be satisfied by the ratios of coefficients. If this is done it is easy 
to see, or to demonstrate, that all of the variates except have zero corre¬ 
lation with y. Using the variates ^ the sampling distribution of the multiple 
correlation R can only depend on the correlation in the population sampled 
between and y, namely, on the multiple correlation in the population 
sampled, which we may designate by p. 

♦ Fisher, * Biometrika,* vol. 10, p. 607 (1915). 



Distribution of Multiple Coi^^elation Coefficient, 


14.659 

659 


The geometrical interpretation of the multiple correlation coefficient R is 
that it is the cosine of the angle between the radius vector of the dependent 
variate y and the planar region including the radii vectores of the independent 
variates ; its distribution when p = 0, which depends only on the geometrical 
elements of volume, has been thus shown* to be 


df— _ [j + ^2 ~ ^)] 

J ri /- o\T I ri 


a K-2)]![i(n2~2)]! 


(R2)H 


>(1 — R2)^^"*-2)d(R2), 


where w, the sample number, is replaced by Wj + Wg + 1 ; but in what way 
this distribution is modified when p is not zero has been hitherto entirely 
unknown. 

It is evident, however, that since we have reduced the problem of the 
multiple correlation coefficient to one involving only a single correlation, the 
frequency density of any configuration will be affected merely by a factor 

(n 2)] ! ^_p 2 \j(M-i) j __ 

IJ (n — 3)] ! \/tc ' J.. * (cosh z — pr)^~^ * 


in which r is the correlation in the sample between y and ; this factor will, 
however, vary, because r varies in the different configurations which give rise 
to the same value of R. Consider now a third variate, Y, representing the 
linear function of the independent variates which in the sample is most closely 
correlated with y, or, in other words, the prediction formula for y. Its corre¬ 
lation with we may represent by eos ij;, and since the partial correlation of 
y with (or any other linear function of the independent variates) when Y is 
eliminated, must be zero, it is evident that 


r = R cos 4^. 

For a given value of v];, therefore, the density factor is constant in the different 
configurations which give the same value of R, but, in the absence of corre¬ 
lation, the frequency with which falls in the range is evidently 


[i (w, - 3)] ! Vk 


; 


hence integrating over all values of the density factor becomes 


_ p2^4 (»!+«•) 

TC 


[j (Wi + — 1)1! 

[J (”i + — 2 )]! 

riK-2)]! f' 


tiK -3)] 




_ sin"*-" . dz _ 

(cosh z — pR cos 4^)”*+"* * 


Fisher, ‘Phil. Trans.,’ B, vol. 213, p. 89 (1924). 



14.660 

660 


R. A. Fisher. 


and the complete expression for the distribution of R is 


df= » (”. + n, - 1)]! _ 

^ [i(«.-2)]![iK-3)]!- ,5 


(X _ > w«) 

TZ 


sin”^~^ j; . dz 


(cosh 2 — pR cos 


3. The Hypergeometric Form. 

Apart from the factor, 

^ p2ji(wi + «i)^ 

the density factor may be reduced to a hypergeometrio function. For in the 
expression, 

i. [t(»t + >»,-l)]! [iK-2)]i r^. r __ 

7C • [i (»x + nx - 2)] 1 ■ [i (% - 3)] ! Jo ^ J _« (cosh * - pR cos <}/)"*+"• 

the integrand may be expanded in the uniformly convergent series 

S (n, + w. 4 2< — 1) t cosB‘ i}, / »T>»y 

(-0 (nj + njj - 1)! (20 ! ‘ cosh"'+"«+“ * ’’ 

in which the odd powers of cos 4;, which evidently disappear on integration, 
have been omitted. Remembering now that 


i 


^ ^ ^ [i(«. + 2«-2)]! 


and 


dz 


_ [t (”i + w. + 2< — 2)] ! Vtc 
J cosh"'+"‘+^'2 [J («, + nj + 2< — 1)] ! ’ 

we have a power series for the integral, which may be written 


or 


r»(n.-2)]! S 

[i (Wi + »2 — 2)1(-0 
P [i («i + «j). 


i:4(Wi + na + 2t-2)] !» , . 
<'U(»i + 2«-2)]! 

i (wj -f ng), inj, p*R2], 


so that the distribution of R obtained in section 2 takes the form 


X (R*)*'**-®) (1— R*)*^-*) d (R*). (A) 



Distribution of Multi'ple Govrelation (Joefficient. 


14.661 

661 


4. Elementary Cases. 

4.1. When ng is even .—When ng is even the identity, 

F [i (wj + ng), J (Wi 4- p*R*] 

- (1 - p2R2)-4(H,4-2n.) _ J,. ( _ p2K*), 

gives a terminating series. 

Thus, when w, = 2, we have the series of distributions 

,//= (1 _ p2)i(«. + 2) (1 _ p2R2)-4(«. + 4) 2p2R2) 1 


having the .special forms 

(2.2) df=^(\ — pS)2 (2 + 2p»Ky(l — p*K*)3 . R <iR, 

(3.2) (3 + 2p*R*)/(l — p*R*)V'‘. R* rfR, 

(4.2) rf/ = (I _ p2)3 (4 + 2p3R*)/(l — p»R*)* . R» (iR 


when is 2, 3 and 4. 

When /ig = 4, we have a somewhat less simple series of distributions 


and when ^2 = 6, a scries which may bo written 

_ 3(1 -p8)tc.. .»> , („., + 2) (w, + 4) , „ (M, + 2) («, + 4) 

■ (1 — p^R*)*'"'-' '-I I 2.4.6 2.4 ^ 

+ 3- ' jL ± - * p4R4 + p»R»| R».-3 (1 _ R*)»<i (R*), 


an expres.sion in whic li the general method of formation of the terms is readily 
seen. 

4.2. When and Wg are both odd .—A second group of cases in which the 
frequency element is expressible in finite terms in elementary functions occurs 
when both and rig are odd. If, for example, we put = 3 in the expression 

r^j-r__. 

Jo ‘ J - X (cosh z — pR cos + 
and integrate with respect to cos we obtain 

f 7 - .‘^1. -a {(cosh z — pR)-'”*'*^®’ — (co.sh z + pR)-"*+*}, 

a form of integral which, as was shown in the case of the simple correlation 
coefficient*, is expressible in finite terms by the aid of the circular functions. 

* Fishor, * Biometrika,* vol. 10, p. 607 (1016). 



14.662 

662 


R. A. Fisher. 


For 

r dz ^ 26 

J _ao cosh z — pR sin 0 ’ 

where cos 6 = — pR, and 0 does not exceed the bounds 0 to 7t; hence 

r _ 2 ! d 6 

J (cosh 2 — pR)"«+- (Wg + 1) ! d cos 0^ sin 0 ’ 
and therefore, if is odd, 

{(oosh * - pR)-<”-+=> - (cosh * + pR)-<''>+“>} dz 

__4_/ d 8in~^ pR 

(^2 + 2)! pn^d (pR) / Vl — p*R* ’ 

Hence for the determination of the simpler distributions of this series we 
require 

(- Y T 7 1 —" \ ; (^ + 3 tan cf} + 3^ tan^ <f>) 

cos<^d^/ cos ^ cos*^'^ v'T- V- 

(-== -4-7(9^ + OEM + 90^2 + 105<’ + 105^*) 

\cos ^d<l>' cos <f> cos^ <f» 

( -4-r! r —^ = —V;(25^ + 23R + 625(i«» + n90t» 

\oos <l> d<f>' cos ^ cos^ 4> 

+ I576<l>t* i- 1165r + 1166<^«, 

which lead directly to the distributions, 

(3.1) <//= i(l — p*)* (1 — R*)-* (1 - p*R*)-2 {3 + a (1 +2p*R*)} RWR, 

TT 

in which a stands for 

pRV^-^R^ - ^ + VR» + HP-R« + ...adinf. 

(3.3) rf/ = i (1 — p*)3 (1 — R*)» (1 — p»R*)-< R2rfR 

X {6 (11 + 10p»R*) + 3a (3 + 24p*R* + 8p<R«)}, 

(3.5) <i/ = ^ (1 — p*)* (1 — R*)®E* (1 — p*R*)-'> R* dR 

OTC 

X {7 (33 + 104p*R* + 28p*R«) + 6a(6 + 90p*R* + 120p«R* + 16p“R*)}. 




14.663 

663 


Distribution of Multiple Correlation Coefficient, 


A similar process of integration is available for other odd values of nj; for 
= 5 we have the distributions 

(6.1) df = (1 - R»)-» (1 - R*rfR 

X {1 + 14p‘R* + a (— 1 + 8p®R* + 8p<R*)}. 

(5.3) df = (1 — p*)« a - R*)‘ (1 — p»R*)-® R« dR 

X {1 + 68p»R* + 36p*R‘ + a (— 1 + 18p*R* + 72p*R« + 16p«R*)}. 
df= (1 - p»)“ (1 - R*)»'“ (1 - p*R')-‘ R» <m 

(5.6) X {26 -I- 4678p»R* + 8664p*R< + 1648p«R« 

^ a(— 26 + 800p*R* + 7200p*R‘ + 6400p®R® + 640p»R«)}. 
The polynomial coefficient of a in the hypergeometric function is itself easily 
expressed in terms of a function of this sort, in the forms 


i (2 — n,), p*R*], 
or 


2.4 ... 


(p2R2)H".+i) y\ ^ 


n^-j-ng—2 




from this the remainder may in any particular case be found fairly easily by 
equating coefficients in the initial terms of the expansion of 


( i (^1 4- ^ 2 ). i i^h 4- ^^ 2 ). i ^ 1 * 


5. The Problem of Large Samples, 

Some confusion has been caused by the fact that, while for any finite value 
of p, however small, the distribution of H will be normal for a sufficiently 
large sample, yet when p — 0 the distribution is far from normal. The 
approximate distribution appropriate to the theory of large samples, for 
different values of p'v/Wg, may be found as follows. 

If we write — B*, and allow ^2 to increase indefinitely, the 

limiting form taken by the general distribution is 

I i _1_ 

^ [i(«i-2)]! I «i 2 ^n,(n, + 2)2.4 ^ 

which may be writtea in terras of a Bessel function as 

{B/tp)‘ e-K*'*-®*'. J,(„. =, (igB) . d (JB*). 


1 


d(JB*), (B) 



14.664 

664 


H. A. Fisher. 


When ni is odd, these may be reduced to elementary fimctions; thus for 
nj = 3, we have 

rf/ = f _ e-KB+«.j 

V27c P 


an interesting distribution which connects the extreme forms found by making 
p zero for uncorrelated populations, and large for populations with a significant 
though still small correlation. When p = 0, we have 

df = (2/k)* exp (— iB2) dB, 

the distribution of X for 3 clegit;es of freedom, while when p is large, B is 
distributed normally about p, in the form 

df = (27c) exp {- i (B - p)2} dB, 


and therefore R is distributed normally about p, with variance which may be 
equated to 1 /wg. 

When = 6, the system of distributions is 


+ (l +^)exp{-i(B+p)»}} dB, 

and when = 7 

P {{■ - pT+‘ ® 

~ + -p + {- i (B + dB. 


In the cases in which », is even, the probability of exceeding a given value B 
may be written 




-a) 


J!L 


= (n, + 2<- 2)]! 


2jn, + 2t-l g-l»* . 


using the fact that when k is odd 


£ 




r'''^--‘-(i+T+n+- 


B* 


2.4... (A; —1) 


). 


the integral becomes 

t =o t\ 






u( ^ 


involving only the terms of two Poisson Series with mean values jp® and ^B®. 
If t and u be regarded as variates distributed independently in two such series, 



14.665 

Distribution of Multiple Correlation Coefficient, 665 

the probability may be identified with the probability that u should not 
exceed t by or more. 

The distributions developed in this section are limiting forms appropriate 
to large samples, in which exact account is taken of the positive bias of small 
observed multiple correlations ; they will provide at least an approximate 
treatment of those cases of great practical importance in which does not 
exceed 100, and in which, therefore, the positive bias is prominent for observed 
values of R which are not small. The fact that sampling errors of the simple 
correlation coefficient have been successfully represented by a normal distri¬ 
bution by means of the transformation z — tanh~^ r, suggests that pending 
fuller tests than are at present practicable, the transformation 

B — tanli“^ R, fi = -y/tanh"^ p, 
will supply tests of significance of precision, sufficient for practical purposes, 
in the important region alluded to. 

Table I (table of B) shows the 6 per cent, points of these distributions, for 


Table of 5 per cent, points of the distribution of B. 


Valiios of 




Valut- of ni 




fix 

1. 

2. 

3. 

4. 

5. 

6. 

7. 

0 

1-9600 

2-4477 

2-7955 

3-0802 

3-3272 

3-6485 

3-7606 

0-2 

1-9985 

2-4720 

2-8140 

3-0955 

3-3405 

3-6602 

3-7613 

0-4 

2-1070 

2-5419 

2-8680 

3-1405 

3-.3796 

3-6961 

3-7930 

oe 

2-2654 

2-6497 

2-9533 

3-2125 

3-4426 

3-6517 

3-8445 

0-8 

2-4505 

2•7855 

3-0640 

3-3076 

3-5268 

3-7278 

3-9144 

1 0 

2-6461 

2-9398 

3-1941 

3-4216 

3-6291 

3-8210 

4-0006 

1-2 

2-8451 

3 -10.59 

3-,3386 

.3-5505 

3-7462 

3-9289 

4-1008 

i-4 

3-0449 

3 - 2796 

3-49.36 

3-6911 

3-8766 

4-0491 

4-2134 

1-6 

3-2449 

3-4.584 

3-6661 

3-8408 

4-0148 

4-1796 

4-3363 

1 -8 

3-4449 

3-6410 

3-8246 

3-9978 

4-1620 

4-3184 

4-4681 

2 0 

3-6449 

3-82«;3 

3-9976 

4-1604 

4-3158 

4-4645 

4-6074 

2-2 

3-8449 

4-0137 

4-1743 

4-3278 

4-4750 

4-6166 

4-7631 

2-4 

4-0449 

4-2027 

4-3539 

4-4990 

4-6388 

4-7738 

4-9043 

2-6 

4-2449 

4-3932 

4-6359 

4-6735 

4-8065 

4-9363 

6-0603 

2-8 

4-4449 

4-5847 

4-7199 

4-8.506 

4-9774 

5-1006 

5-2204 

30 

4-6449 

4-7772 

4-9055 

5-0301 ! 

6-1512 

6-2691 

5-3840 

3-2 

4-8449 

4-9705 

5-0926 

5-2116 

5-3273 

6-4404 

6-6508 

3*4 

5-0449 

5-1644 

5-2809 

5-.3946 

5-6056 

6-6142 

6-7204 

3-6 

5-2449 

6-3689 

6-4703 

6-6792 

6-6867 

6-7901 

6-8924 

3-8 

: 5-4449 

5 5539 

6-6606 

6-7650 

6-8676 

6-9679 

6-0665 

4-0 

1 5-6449 

5-7493 

6-8516 

6-9521 

6-0506 

6 1476 

6-2426 

4-2 

> 5-8449 

6-9-451 

6-0434 

6-1401 

6-2351 

6-3285 

6-4204 

4-4 

! 6-0449 

6-1412 

6-2359 

6-3290 

6-4206 

6-6109 

6-5998 

4*6 

, 6-2449 

6-3376 

6-4288 

6-6187 

6-6072 

6-6945 

6-7806 

4-8 

i 6-4449 

6-6342 

6-6223 

6-7091 

6-7947 

6-8792 

6-9626 

5 0 

j 6-6449 

1 

6-7311 

6-8162 

6-9002 

6-9831 

7-0649 

7-1467 








14.666 

666 


II. A. Fisher. 


values of |3 from 0 to 5 and of from 1 to 7. The values tabulated are the 
values of B which will be exceeded by chance in 5 per cent, random trials, and 
which therefore give a presumption that p is really greater than the value 
postulated. Thus, when Wj == 3, it may be seen at a glance that a value 
B = 5*7 indicates that p probably exceeds 3*8. 

For a great part of the labour of constructing this Table 1 am indebted to 
Mr. A. J. Page, I.C.S., whose assistance in my laboratory while on leave has 
thus enabled me to press forward with the theoretical investigation of the new 
distributions. 


6. The Probability Integral. 

For calculations involving finite probabilities of occurrence, including tests 
whether an observed R is or is not significantly discrepant from a hypothetical 
p, it is not the frequency element but its integral that is required. It is 
fortunate that the frequency distribution we have found when Wg is even leads 
to a probability integral of a tolerably simple form. 

The frequency element 


' 2 •[*(”!-2)]! 
may be written 

a _ g [i(”.+”, + 2t-2)]!» . 

‘-0 [J (n, + n, - 2)] ! J! ^ ' 


but if is even 


. (ijr 


r -M — 
R8)S<".-2) 


[*K-2)]! 


d(R»); 


r** n, + n, + 2< — 2 , (Ra)i(-+^«-=) (1 — R»)K>..-i!) 

Jo 2 [i(«i + 2(-2)] !•[*(«,-2)]! ' ' 

is 

R«.+.a J 1 + ?t + ^ (1 _ Ra) + (1 _ R*)a ... 

t 2 2.4 


1 (w, -4- 2t) . .. (to, + 2t 4) /j_|?aw(>i,-2)1 

^ 2 .4 .7. ■(«, - 2) J 

(1-RT Ei (», + 2< + 2p- 2)] ! 

p! ' [i(”x + 2«-2)Jl ’ 




Distribution oj Multiple Correlation Coefficient, 


14.667 

667 


Again 

2 ft (% 4- 4- 2^ — 2)] ! 2* [4 (^1 + 2p — 2)]! /T>a\*("ii+2«) 

(^0 W2-2)]!<! P [i (w^ + - 2)] ! ^ ^ 

is 


or 


R"i ft (^1 ~h — ^)] ♦ -n* / -h -h 2p W| 

[i (n, — 2)]! \ 2 ' 2 ' 2 * 

ft (^ + 2j) 2)]! _]R>_p /_ _Wg 

ft (Wi — 2)] ! • (1 — p2Ra)H».+n.+*2p) ^ V 2 ’ 



2 ' 


P«R«], 


which terminates in 1 terms, and is equivalent to 


i (Wg) ! R”i(p2Ra)»» / n, + 2p — 2 

[i (n. - 2p)] ! ' (1 — p2Ra)H«x < n.+2p) ^ \ F* 2 

nj — 2p + 2 1 \ 

2 ' p*RV* 

or 

(i”2) ’ R"' (-)" _ p(-p ”i + . ” 8 ”a -. ^P.± ^ _L_\ 

[J(n,-2p)]!‘(l —p*R»)‘ 2 ‘ 2 ’l--p*RV 


The probability integral, when W 2 is even, may therefore be written in the 
forms 

n ..T,... ‘ [4 («, + 2p - 2)1! (1 - RT 

<1—p) R [H«i-2)]!p! (1 — P*R=‘)* 


f(-7>. -I*. P*R’). 


/_L-pL, 

\1 - p-^R» 


2 1 (n* ‘-i) 




pl—r, "i + ”8 Wi — 2p + 2 1 \ 

^ I 2 ' 2 ’ 1 - p»R*j • 


both of which terminate in (Wg + 2) elementary terms, 
o 

When }i 2 = 2, we have the simple probability integral 

{(1 __ pa)/(l - p2R2)}l («»+2) . 

when 71.y — 't, it becomes 

|nj^+4 1 - R ; — (1 — 2R»)| R^ 
•1 —p»R*/ I 2 1 — p>R* '/ 



14.668 

668 


R. A. Fisher. 


and, when n 2 = 6, 


/ 1 - p*' 

f(n, + 6) (n, + 8)/ 

1 - R* Y 

\1 — p*R* 

' 1 2.4 \] 

1 - p*R2' 


+ (1 


3R* + 3R‘)| R“-. 


2 '1—p*R* ‘ '* ' ’) 

It should be observed that the coefficient of {(1 — R*)/(l — p®R*)}*’ is given by 

a (n,-f n, + 2p--2)3! ( (RH-- RN 

li (^j + «2 — 2)] ! dx^ I X f 

when X — — 1. 

7. Extension of the Analysis of Variance, 

The distribution of the simple correlation coefficient, although one of the 
tirst sampling distributions to be determined with exactitude***, has 
hitherto occupied a somewhat isolated position. For all the exact distri¬ 
butions of statistics since discovered have grouped themselves in a single 
system ; they are all amenable to the same technical procedure known as the 
analysis of variance; and all may be reduced to an equivalent problem of the 
distribution of the difference of the logarithms of two independent estimates of 
variance, based respectively upon and n^ degrees of freedom. 

The distribution of such an estimate s-^ derived from degrees of freedom 
is given by 

and a is the parameter of which is the first estimate. 

If, now, ^2 = and 

z = log Si — log 52, 

it follows that 

<, = (n,/n,) e** 

and the simultaneous distribution 


li (»i 

may be written 
df^ 


• 2 )]! 






[iK-2)]! [i(»!!-2)]!\n2 


[J (n, - 2)]! 

-(^ { * (»,+».-*) e‘‘ * it*") jft dz : 

! \no ' 


* Fisher, ‘ Biometrika/ vol. 10, p. 607 (1916). 



14.669 

Zhstribution of Multiple Correlation Coefficient, 669 

this expression may be integrated with respect to ^3 to yield the distribution 
of in the form 








[i (wi — 2)] ! [A (wa — 2)] ! * (n2 4- 


. dz^ 


completely independent of the unknown variance. 

By the insertion of the appropriate values of and Wg, including the 
important bounding values of unity and infinity, the appropriate distribution 
of z for the analysis of variance is obtained. In the case, for example, of the 
multiple correlation coefficient drawn from uncorrelated material, is equated 
to the number of independent variates, W| -f* Wg + 1 fo the sample number, and 
2 z to 

log (R*/h,) — log (1 — IVIn^). 


It was from the first obvious that this system was capable without formal 
modification of extension to the case in which Sj and Sg were estimates of two 
different parameters Oj and (Tg ; for in such cases we have only to write 
^ — log — log Og, and the distribution found above will be that appropriate 
to the variate z — 

The new system of distributions found for the multiple correlation coefficient 
derived from correlated material is not only a generalisation of that previously 
found* for the simple correlation coefficient, but provides an extension of a 
different kind from that mentioned above to the analysis of variance. For the 
limiting distribution found in section 5 (distribution of B) may be interpreted 
as the distribution of the sum of the squares of 7i^ variates normally distributed 
with equal variance, but not with zero means as in all cases previously discussed. 
1 ”• 

To show this, let T — — S {Xp — ap)***, in which , x,,^ are variates 

^QT j> = I 

distributed independently about zero with common variance a^. Let 
^ — S (aa;)/o-S (a®), then ^ will be normally distributed about zero with unit 

variance, and if we write for T — ^ — \/S or J S 

which is the sum of the squares of (w, — 1 ) quantities independently distributed 
about zero w'ith unit variance, it ap|>ears that the distribution of is of the 
familiar form 

[4 “14X1. 

and is independent of that of namely (27t)~f e“*^* d^, 

* Fisher, ‘ Biometrika,’ vol. 10, p. 507 (1915). 



14.670 

670 


R. A, Fisher. 


If, now, p is written for \/ S (a*)/a and a; for 5 — P, it follows that 

and as the same value of sc* is provided by the two values of P db the 

frequency element required from the distribution of 5 is 

{e-S («+*)• + e-Jo-*)-} dx, 

V27C ^ 


only positive values of x being now considered. Substituting for in terms of 
T and sc, the frequency distribution of the latter two variates will be given by 

For a given value of T, the variate sc cannot exceed \/2T, and the random 
sampling distribution of T is therefore found by integrating between 0 and V'ST. 
Expanding the hyperbolic cosine in powers of sc, and integrating term by term, 
since 

[j(ni + 2p-2)]! (2p)!' 


we have the distribution of T in the form. 

CO 'ri(nx+‘-ip-2) nzp 

cffs=.e“*^ S - - -—— ^ - 

^ ,to[i(n, + 2p-2)]!2».y! 


rfT, 




_/n.i(TS*) +_1_ (Tg!)!. 

[*K-2)]! I P'^n,(n, + 2) 2! 


... j dT, 


which is the B-distribution of section 5 if T is equated to JB*. 

This interpretation of the distribution previously obtained adds greatly 
to its importance, for it is seen to replace the x* distribution of the analysis of 
variance for cases in which the sum of squares corresponding to degrees of 
freedom is derived theoretically for non-central deviations with fixed central 
displacements. This will be similar to, but not identical with, the case of 
the degrees of freedom in multiple correlation in its proper form ; for although 
these are non-central, the displacements will depend on the variation in the 
sample of the independent variates, and this will vary from sample to sample. 
In many cases, however, such as the dependence of weather upon the position 
and altitudes of a number o^ fixed meteorological stations, we are not interested 



14.671 

Distribution of Multiple Correlation Coefficient, 671 


in the effects of possible variations in the positions of the stations, but solely 
in the possible variations of the weather at these spots. In fact, the problem 
of practical importance is often that in which the central displacements are 
constant, and although it may be urged, rightly enough, that to such cases the 
purely empirical concept of multiple correlation is not the most appropriate 
approach, yet it remains true that of the practical applications of multiple 
correlation methods many are of this kind. 

The airect extension of the analysis of variance for non-central squares 
may be completed by writing 


(If-- 


1 


- 2 ) 1 ! ^ 




T 

dt,, and — 
" <2 


1 — R* fiz 


then, if, in spite of the caution above, we choose to express our results in terms 
of R, 




1 2. ^ _ I -112 d (R^) 

R2 R2(1_R2)* 


« 2 -fT = 


!L 

R*^ 


and the distribution of H is found by integrating with respect to T from 
0 to X) ^ the expression 




r-hli* 


, 1 -R 2 


<7R2 


rpi (»i+Wi4-2p-2) 


R* / RMl-R^)p“o[i(/^l4-2p-2)]!2^p! 


dT, 


a process which yields 


df^ (R 2 )i(»‘-^). 


( 1 —R 2 )H«.- 2 ) 

[i K- 2 )]! 


e 




-/i 

V 

P - 0 


l i(n,+n,+2p~2)V. 
lJ(«,+2p-2)]! 2».pi 


d (R*), 


or 

[i (»i-2)] ! [J (»2-2)] ! ^ ' ’ I ^ n, . 1 ! 2 




a third general distribution of this interesting group. 

Although it will not be possible within the limits of this paper to give an 
account of the properties of the distribution of Type (C), beyond indicating 
their analogy with those of Type (A), it should not be overlooked that in the 
problems in which the multiple correlation coefficient is actually employed, 
distributions of Type (C) will be, owing to the absence or irrelevance of 
sampling variation in the variances of the independent variates, of at least 
as frequent occurrence as those of Type (A). 



14.672 

672 


R. A. Fisher. 


A typical example of the distinctioa here drawn is provided by the correlation 
ratio. If corresponding to any value x of the independent variate a number of 
values fia, of the dependent variate y is observed, then the correlation ratio 
yf oiy on X is defined by the relation 

>1* _ S (y. - y)*} 

1-Y)» S(y-y.)» 

in which y,. is the mean of y in any array, and y is the general mean ; the 
variance in all arrays is supposed equal, and the summation in the numerator 
is applied to the several arrays, while that in the denominator is applied to the 
whole of the individual observations. In most practical cases the idea of a 
sampling distribution of yj* can only be given a definite meaning by supposing 
the number in each array to be the same for all samples. In such a case 
the distribution of yj* will be that of R* in distribution (C), with equal to one 
less than the number of arrays, and ni + ^2 + 1 equal to the total number of 
observations. If, however, the numbers be regarded as subject to sampling 
variations, then the distribution (A) may be used, and will be exact, apart from 
grouping errors, if the expectations of y for the values of x in the sampled 
population are normally distributed. 

Summary. 

By an appropriate linear transformation of the independent variates it may 
be shown that the sampling distribution of the multiple correlation coefficient 
does not depend on the whole matrix of correlations between these variates, but 
solely upon the multiple correlation in the population sampled. 

The actual distribution (A) may then easily be obtained by similar methods 
to those by which the distribution of the simple correlation coefficient has been 
obtained. 

The frequency function involves a hjrpergeometric function of p*R* which is 
a rational function when ni and are both even, algebraic when only is even, 
and reducible to circular functions when and are both odd. 

The case of large samples pelds a series of distributions (B) of great interest, 
involving Bessel functions, which connect the distributions with the Gaussian 
and are intimately related to a double Poisson summation. Owing to the 
practical importance of this limiting form a table of its 5 per cent, points is 
given up to seven independent variates. 

When is even, the probability integral of the general distribution is 
expressible in finite terms which are developed in section 6. 



14.673 

IHsfribvtion of Multiple Correlation Coefficient. 673 

The (6) distribution of Section 5 replaces the distribution in the anal 3 r 8 is 
of variance if the squares summed are non-central. An analysis of variance 
so extended leads to a third group of distributions (C), closely related to (A), 
and tending like it to a common limit (B). The distinction between (A) and 
(C) arises from the fact that in cases proper to the multiple correlation the 
central displacements will vary from sample to sample owing to variations in 
the second order moment coefficients of the independent*variates. and for such 
cases (A) is the correct distribution. The type (C), however, is of frequent 
occurrence owing to the absence or irrelevance of such variation. 



16.179a 


15 

LIMITING FORMS OF THE FREQUENCY DIS¬ 
TRIBUTION OF THE LARGEST OR SMALLEST 
MEMBER OF A SAMPLE 


AUTHOR’S NOTE 

The distribution of the largest and smallest observation in a sample 
of given size offers considerable numerical difficulties. It is here 
shown that the limiting forms are few and comparatively simple, 
although with a normal distribution they are approached exceedingly 
slowly. 


* Reprinted from Proceedings of the Cambridge Philosophical Society^ Vol. XXIV, 
Pt. 2, pp. 180-190, 1928. 



15.180 


180 Messrs Fisher and Tippett, The frequency distrihiUion of 


Limiting forms of the frequency distribution of the largest or 
smallest member of a sample. By R. A. Fisher, Sc.D., Goiiville 
and Cains College, and L. H. C. TiPPFrrr, M.Sc. 


1. 1 ntrod uotory. 

In a previous piper on the subject of the distribution of the 
large.st iiieinber of a siiniple from a normal population, one of the 
authors has given constants involving the first four moments for 
samples up to 1000. In this paper, possible limiting forms of such 
distributions in general are discussed. It will appear that a 
particular group of distributions provides the limiting distributions 
in all cases, and that the case derived from the normal curve is 
peculiar for the extreme slowness with which the limiting form is 
approached. 


2. The possible limitmg forms deduced from the functional 
7'elatiou tvhicJi they must satisfy. 

Since the extrcmi! member of a sample of mn may be regarded 
as the extreme memlier of a sample of n of the extreme members 
of samples of m, and since, if a limiting form exist, both of these 
distributions will tend to the limiting foim as m is increased 
indefinitely, it follows that the limiting distribution must be such 
that the tixtreme member of a sample of n from such a distribution 
has itsc'lf a similar distribution. 

If P is the probability of an observation being less than a?, the 
probability that the greatest of a sample of n is less than x is 
cimsequently in the limiting distributions we have the functional 
equation 

the solutions of this functional equation will give all the possible 
limiting forms. 

If a, is not equal to unity, then 


X = ax + 6, 

when X— —, 

1 — a 

and at this point P^ — P, 

P = 0 or 1, 

consequently the solutions fall into three classes: 

I. a ^ I, P^ (.r) = P (x -I- bn), 

11. P — 0 when ars* 0, P*^ (x) — P (Unx), 

III. P = 1 when a; =.0, P»* (x) = P(ana:). 



16.181 

the largest or smallest member of a sam^ 181 

I. If P“(^)-P(a? + 6n). 

then n log P (js) *■ log P (a? -4- 6^)* 

and log n 4* log(— log P (a?)) = log (— log P (a: ■+■ 6n)) ; 

therefore the expresaion log (— log P (a?)) — ^ ^ is constant, or 

On 

periodic, with period 6». 

Now for all values of m and n 

hmn = + ^n» 

and if 6n is an analytic function of ?i, a supposition which excludes 
the periodic solution, 

w6'mn = b'm , = 6 n. 

whence mb'm = nb'n » 



Fig. 1. Distribution p~e~* *, or dpssze*~* * dx represented by the curve y = «*-«“*• 


Putting n = 1, it appears that 

d = 0. 

Hence log (— log P*) = - + constant, 

c 

or, the limiting form is that of — log (— log P*) = a?, for c must be 
negative since x is assumed to increase with P. The distribution 
of the greatest of a sample of n from this distribution is 
■“ log (- log P.) « a: ~ log w, 

the distribution being merely shiflbed, without change of size or 
form, through a distance logn. The curve is shown in Fig. 1. 



16.182 

182 Messrs Fisher and TippeU^ The frequency distribution of 


II and III. If P(anar), 

®jnn “ » 

and, if a is an analytic function of n, 

unit — Clfn tt n» ntn ~ ® t« ®fi» 


whence 

of which the solution is 


logtt« = ~log/i. 


*» n 


-i/fc 


since a ~ I when « = 1. 

Now log(—log P(ar)) is increased by logn when log a; is in¬ 
creased by logttji, so that, excluding as before the periodic solution, 

must be constant. This gives 

log (~ log P (:v)) = — A; (log x + c) 
or ^\ogP(x)^(Ax)-^. 

If P=^0 when x — 0^ k will be positive (II). 

The form of the curve is then that of 




^dx, where k is positive. 


If P~ 1 when a; = 0, k will be negative and all possible values 
of X will be negative ; in this case (III) the form of the curve is 
given by 

— log P ~ (— ar)*, where k is positive, 

/*=«-<-*)*, 

The only possible limiting curves arc therefore; 

I. dP^e-^-^~"dx, 

in which the effect of selecting the greatest value of a sample 
of n is merely to shift the curve, without affecting its scale, through 
a distance log ?>. 

II. 

in which the effect of selection is to increase the scale of the curve 
by the factor maintaining the terminus x - 0 unchanged. 



16.183 


(Ae largest or smallest member of a sample 183 

III. 

in which the effect of selection is to decrease the scale of the 
curve by the factor while maintaining the terminus a; ~ 0 

unchanged. In this case alone will the selected curve increase 
materially in accuracy as selection is increased; the weight of an 
observation, from curves of constant form, will be inversely pro¬ 
portional to the square of the scale, and will be proportional to 
TO*/*. The accuracy of the extreme observation will therefore increase 
more rapidly than that of, for example, the mean, if k is less than 2. 

3. The limiting form appropriate to any particular frequency 
distribution. 

If in any frequency distribution p is the probability of an 
observation being less than x, and if as 1 the quantity 

(1 — p)4j* 

tends to a finite limit, a* then it is evident that will have 

the form 

P sat e~ *“***”* 

in the limit for large samples of to. 

Since, for any two values of P other than 0 and 1, the values 
of X as TO tends to infinity tend to the finite ratio of the values of 

(-logP)-W 

the limiting form of the distribution will be the same if 
1 — 

where the range of log for any finite range of log a?, tends to zero 
as X tends to infinity. 

The scale of the distribution for the greatest of w., measured 
by will in such cases approach the limit 

where the argument of if> is given by the equation 

(a;). 

Elqually, for any frequency distribution for which 

(1 

tends to a finite limit A oa p tends to unity, the limiting forms 
of the distribution of the largest of a sample of n will be given by 



16.184 


184 Messrs Fisher and Tippett, The frequency distribution of 

Since, in this case, for any two values of P other than 0 and 1, 
the difference of the two values of xfc tends to a constant value, 
the limiting form of distribution will be the same if 

when the range of log in any finite range of xjc tends to zero as 
X tends to inmiity. Thus, if c is constant, may contain factors 
such as £c*. The location of the distribution, given by 

^»log(n^), 

will then, as the limiting form is approached, change as clog(n^), 
in which the argument of is given by the equation 

x = c log (n^ (a:)). 

The case in which c is constant does not exhaust the applica¬ 
tions of this limiting form, for whatever function 1 — p may be of 
X, if we write 

then, if the range of log (1 — p) + xjc from a? = f to a? = f -H tends 
to zero, as x tends to infinity, for all real values of t, then will the 
same limiting form be valid. 

For example, let 



l~p = 


then 

c = 

1 

and 

ct 

t 






which tends to zero, if r is positive, for all values of t. 

But 

log (1 — p) + xjc — — af 

as (r — 1) + smaller terms; 


the range will therefore tend to zero, for 


r (i— 1) 
2 




r-1 t* 
2r 


which tends to zero, for all values of t, if r is positive. 



16.185 


the largest or smaUesi member of a oample - 186 

The parameter c, which measures the scale of the distribution, 
will increase if r < 1, and decrease if r > 1, while the location of the 
mode as the limitin| 2 ^ form is approached is given in general by 

or n(l—p)«l. 

Again, for the normal curve with unit standard deviation 

where X tends to zero as e tends to infinity, 



log (1 —p) 4- xfc = — \a^ — log « + fljf + 1 — J log (27r) — X 
= + i f - log f + 1 - i log (27r) - X, 

where X vanishes as a; oo , at all values of x from f to f + ct 
For sufficiently large samples of n from a normal curve, the 
distribution of the largest of the sample will be centred about a 

mode tn given by _ 

e*’”*mV2w — n, 

with scale given by 

— 

^ ~ m* Hf- i ‘ 

4. The approach of the distribution of the greatest of a normal 
sample to its final form. 

The final form for the largest of a normal sample has been 
shown to be given by 

where c diminishes to zero as the sample increases, in such a way 
that to the degree of approximation required in very large samples 

m 

^ m* + 1 

and e m*J 2^ = n. 

Since for any finite value of m, however lar^, c will still be 
diminishing as n increases, the case has an analogy at any stage 
with the distribution derived from 

!»”«-*-***. 

in which also the scale diminishes as n increases. This analogy 



16.186 


186 Messrs Fisher and Tippett, The frequency distribution of 


may be utilised by equating the rate of change of the scale with 
increasing n in the two cases. 

Now, for P SSI c ^ 

we have dP = kii (— e “ dx, 

so that the logarithm of the ordinate at any point is 
(A: — 1) log (—a?) — n (— x)^ + constant, 
giving as equation for the mode, m. 




k- \ 
nk * 


whence 


dlog(-a?) 
d (log n) 
But for the normal curve 


1 

ic' 


d log c _d log c dm __ — 1 

d log n dm d log n (m® 4- 1)“ ’ 

Hence the distribution in which 

should provide a ^nultimatc form of approximation, which will 
duly tend to the ultimate form as h tends to zero. 


5. 2'he moments of the ultimate and penultimate forms. 
The moments of the ultimate form 


dP^e-^-^~"dx 


may be found most directly from the generating function of the 
semi-invariants 

where M= j e^^dP. 

For, writing z for e“*, 

if ** f z~^e~^dz = (— t)! 

Jo 

and = + 

whence it follows that the distance of the mean from the mode is 
7 = *577215665, 

the variance is ^ ~ ^ ” 1*64493407, 

the third moment is 

2 {l + i + i +...J - 2-40411381, 



16.187 


the largest or smaUest member of a sample 187 

while the fourth moment is given by 

- 3/*,» = 6 |l + i + i +...| = ^ = 6-4939394. 

Consequently, for sufficiently large samples we shall have 

Mean — Mode = 7 c = —» 
m* 4-1 

Variance = ? c“, 

o 

A * 1*2985676, 

A *5*4. 

For the penultimate form 

1 

writing — a; «= = i*, 

we have dP^ — e~^dt^ 

and /*/ = (-)*• (- xYdP = {r^t^dP == (-)^ {hr )!; 

also the mode is given by 

— a? — (1 — 

Hence we have as penultimate formulae 

Mean — Mode ^ {(1 — K)^ — ^ 1), 

Variance = ^ \{^h )! — {h !)*), 

together with >9, and ySa expressed in terms of h only. 

The extreme slowness with which the ultimate form is ap¬ 
proached is well shown by the fact that even for enormous samples 
the penultimate form is still materially different in its yS coefficients. 
The following tables show, for different values of h, the corre¬ 
sponding values of m and ?i, and, in parallel columns, the distance 
of the mean from the mode, the variance and the ^ coefficients. 
It will be observed that even for samples of nearly a billion the 
penultimate form is still considerably different from the ultimate 
form. The appropriateness of the penultimate form tor samples 
of 1000 downwards can be tested from the results given in a 
previous paper*, using for m the value of x for which p= 1 /w, and 
the corresponding values of c and h. 

* Biometrika^ zvn, pp. 864-387 (1925). 



Table A. 


16.188 

188 Messrs Fisher and TippeU^ The frequency distribxUum of 



form for vory large sa[nple8. T'he <listaiice from mode to mean is 
given correctly for samples somewhere between 100 and 200 and 









16.189 


the largui or mnaUM member of a aampU 189 

is underestimated by an amount which seems to attain a maximum 
of about 1 for samples of about 1000, whereas the value for the 
ultimate form is over 7 in error for samples of nearly a billion. 
The standard deviation is given by the |>enultimate form with a 
negative error of about 2^ 7o lOW and only about 4^ Vo 
while the ultimate form is nearly 10®/j, out at 1000, and just 
under 3 Vo ^ billion. In both comparisons the largest deviations 
occur in the /3 coefficients. The latter arc consistently too low in 
the penultimate form for samples of 1000 and less, and probably 
do nob attain a close approximation until the sample number is 
nearly a million, while an equally good approximation to the 
ultimate values /3i =* 1*299 and woulcf only be attained by 

such incredibly large samples as are represented by values of about 
*004 for h (c. 10“). The changes in /y, and ySj with varying A, 
together with the actual values for samples up to 1000, are shown 
in Figs. 2 and 3. 





16.190 

190 Messrs Fisher and TippeU, The frequency distribvMont etc. 



The limiting distribution, when n is largo, of the greatest or 
least of a sample of tt, must satisfy a functional equation which 
limits its form to one of two main typ<^s. Of these one has, apart 
from size and position, a single parameter h, while the other is the 
limit to which it tends when h tends to zero. 

The appropriate limiting distribution in any case may be found 
from the manner in which the probability of exceeding any value 
X tends to zero tis x is increased. For the normal distribution the 
limiting distribution has A, = 0. 

From the normal distribution the limiting distribution is 
approached with extreme slowness; the final series of forms passed 
through as the ultimate form is approached may be represented 
by the scries of limiting distributions in which h tends to zero in 
a definite manner fis n increases to infinity. 

Numerical values are given for the comparison of the actual 
with the penultimate distributions for samples of 60 to 1000, and 
of the penultimate with the ultimate distributions for larger 
samples. 




16.5aa 


16 

TESTS OF SIGNIFICANCE IN HARMONIC 
ANALYSIS 


AUTHOR’S NOTE 

A certain amount of disparity in usage and opinion among meteor¬ 
ologists wishing to test the significance of supposed periodicities had 
arisen in the absence of any treatment of this subject by the prin¬ 
ciples applicable to small samples. 

The reader will notice that the periods of the components discus.sed 
in this paper are not integral multiples of the unit interval of obser¬ 
vation, but integral submultiples of the entire period of observation. 
Using these it is shown that an exact solution can take account of 
the two circumstances which had given trouble (i) that the supposed 
periodicity to be tested is usually selected as having the largest ap¬ 
parent amplitude, and (ii) that the variability of the series must be 
estimated from the data, and needs to be eliminated from the test 
of significance. 

The treatment of a frequency distribution consisting of a series 
of polynomial arcs with the highest possible contact at the points 
of discontinuity is of some mathematical interest. 

I have also used this example to illustrate the properties of esti¬ 
mates involving non-linear functions of the frequencies (Paper 36). 

The exact tests arrived at were embodied in a short table, designed 
to meet immediate and practical needs. A fuller table is published 
with this edition, giving for all values of n the number of submultiple 
periods available, from 5 to 50, both the 5 per cent and the 1 per cent 
values. The test is now so direct and easy that the urge to ascribe 
significance to casual fluctuations in time series, which all who deal 
with such series must have felt, should be capable of rational control. 


* Reprinted from Proceedings of the Royal Society^ Series A, Vol. 125, pp. 54- 
59, 1929. 



16.54 


Teats of Significance in Harmonic Analysis. 

By R. A. Fisher, F.R.S, 

1. Schuster^8 Test. 

If a series Wg, , « 2 n+i constitute a random sample from a normally 
distributed population, then any linear function 

-M + 1 

A S (a,u,) 

I 

will also be normally distributed ; moreover its mean will be zero if S (a,.) = 0, 
and its variance will be equal to that of the original population if 

S K2) _ 1. 

Any other linear function 

2n+l 

B = S 
1 

will be distributed independently of the first if 

S(flrA) = 0, 

and in this case the sum of the squares, 

a; = A2 -f B2, 

will be distributed so that the chance of exceeding any particular value of 
X is 

_ * 
e ® , 

where c is the mean value of a;, equal to twice the variance of the population 
sampled. 

This proposition, which gives the yf distribution for the particular case 
M = 2, is the basis of Schuster’s test of the significance of any particular term 
in the harmonic analysis of a series. For the coefficients 

^ / 2 27rpr 

“ V 2n + 1 2w -f 1 ’ 

, / 2 . 

^ 2w + 1 2n + 1* 

fulfil the necessary conditions for all integral values of p. Values of p from 
1 to n give independently distributed values of x and, if the variance of the 



16.55 

55 


R. A. Fisher. 


population were known a priori, the test would be rigorous for any one of 
these chosen in advance. 

2. Allotvancefor Selection of the Largest Term, 

The practice of picking out the larger values of x, not in advance, but by 
reason of their exceptional magnitude, requires, as Sir Gilbert Walker ha« 
shown, an important modihcation of the test of significance. For, if we wish 
to test the significance of the largest observed value of x, we must compare the 
value observed with the sampling distribution of the largest of n independent 
values, and not with that of any one value chosen in advance. If P stand foi 
the probability which we adopt, as sufficiently small to be used as a criterion 
of significance, the corresponding value of x will be given by 

X 

C-c =p, 

for any particular term, but if x is chosen to be the largest of n independent 
values, it is necessary that the probability should be 1 — P that all the n 
values shall be less than x, so that 

h - e ij = 1 - P 

is the equation which determines the least value of x to be judged significant. 
This is the criterion derived by Walker. 

3. Allotvancefor the Sampling Error of the Estimated Variance. 

In the practical application of this criterion, when c is not known a priori, 
it is necessary to substitute for c an estimate of it derived from the data, and, 
for an exact test, to take into account the sampling error of this estimate. The 
estimate of c will necessarily be based on the variance observed in the original 
sample, or, what comes to the same thing, on the average value of x for 
the n possible periods; and, whether we take, as our actual estimate, 
the average of all the n values, or the average of the (n — 1) values 
other than that to be tested, all that is required for an exact solution, in either 
case, is the frequency distribution of the largest of n values of x, ei^pressed as a 
fraction of the total of the sample of n of which it is the largest member. 

If x^, ,..,x„ are the co-ordinates of a point in Euclidian space of n dimensions, 
the simultaneous distribution of the n values will be represented by a density 
function 

- i (»j + *» + ... + *■) 
e c 



16.56 

Harmonic Analysis, 56 

which is constant over plane finite regions of w — 1 dimensions, bounded by the 
n-surfaces 

= 0 

in the form of a generalised tetrahedron. In every such region, the distribution 
of the ratio of the largest co-ordinate to the sum of all co-ordinates will be 
the same, and, since the density is constant over each such region, the 
distribution is to be found merely from the elements of generalised volume, 
into which the region is divided for fixed values of the ratio. Any particular 
co-ordinate, e.ff., will be the greatest in one wth of the whole region, this 
fraction being bounded on the one hand by the loci, at which it ceases to be 
greatest, 

Xj — = a^g, aTj — x,„ 

and, on the other, by the boundaries, 

a?2 — 0, a^g = 0, ..., = 0 ; 

within this region it is required to find the distribution of the ratio 
^ a?! -h a?a a?„ 

4. The Uiacontinaiiies of the Distribution, 

The distribution defined geometrically by the dissection of a generalised 
tetrahedron exhibits a number of discontinuities ; the linear regions which 
constitute its bouudary intersect n — 1 at a time at the sets of points 
typified by 

= Xa = Xg =.== x„ 

Xg = 0, Xj = Xg ^.= x„ 

Xa == Xg = 0, Xj = X4 = . . . = Xf^ 


X2^X^= .. Xn == 0, 

at which it is evident that the values of g are 


1_ 

9 

n 





representing in succession, the centre of the generalised tetrahedron, the 
centres of all its bounding faces, of successively lower dimensions, which meet 
in the point, -gr 1, the middle points of the edges running from this point, 






16.57 

57 


R. A. Fisher. 


and finally the limiting point, = 1, itself. Hence g is distributed over the 
range from 1 jn to 1 ; and for an exact test of significance we require to know 
the probability with which any particular value between these limits is 
exceeded. 


5. The Exact Distribution, 

A point about the distribution which greatly facilitates the solution, is that 
within the region between any two discontinuities the probability integral of 
the distribution is merely a polynomial in g of degree n — 1. For the 
boundaries of any region, g = g^y change the magnitude of their elements 
continuously at rates determined by the magnitude of their boundaries, and 
so on down to the bounding edges, the lengths of which are linear functions of 
g ; consequently the probability integral is in each region a polynomial of 
degree n — 1, but from region to region the (w — l)th differential coefficient 
with respect to g changes discontinuously. 

We may therefore represent the probability integral by the form 

P = ai(l - gy-^ + aa (1 - -f ... -f. (1 - ngy-\ 


in which as many terms are to be taken as have positive quantities within the 
brackets. The last term is therefore included for no possible value of g^ but 
is written above in order to utilise the condition that when g < 1/w the 
probability integral shall be unity. This condition is sufficient to determine 
the n coefficients by equation of the coefficients of g^, g^, g^*~^. 

To determine their actual values let 


/ == — 1 + -f- agt* ... -4- 

then the equations of the successive coefficients give 

•••• 

at the value^ t = 1. These are evidently equivalent to 


/= 0 , ?-/= 0 , 


jn-1 


for the same value, so that /, being of degree n, must be a numerical multiple 
of {t — 1)*, or, in view of its first term, 





16.58 

58 


Harmonic Analysis. 

We have therefore the probability integral in the form 
P = (1-2^)-+... (1-A^r-s 

where k is the greatest integer less than 1 Jg. 

6. Summary and Table of 5 per cent. Valties. 

A practical convenience of the form which has been obtained for the prob¬ 
ability integral, is that for small values of P, such as are needed in tests of 
significance, the magnitude of the successive terms decreases very rapidly, 
so that even when, as at the 6 per cent, point for n = 50, as many as seven 
terms exist, very high precision is obtained from the first three terms only. 
Indeed the first term alone gives a very satisfactory approximate test of 
significance. The first term has, moreover, a simple meaning in relation to a 
related statistical problem. There are, in fact four related distributions each 
of which is the appropriate solution of one of four problems. 

(I) The distribution of any one harmonic term obtained from a random 
sample of numbers drawn from a population of known variance. Schuster’s 
solution of this is given by the distribution of the form 

P = e-*. (1) 

(II) The distribution of the largest of the n harmonic terms obtained from a 
similar sample ; for this we have Walker’s solution 

P = 1 — (1 — c-*)^ (2) 

(III) We may ask what is the distribution of any one harmonic term as a 
fraction of the total (or mean) of the terms obtained from the same sample ; 
here there is no restriction that our term should bo the largest, and all points 
within the generalised tetrahedron are available, so that 

(3) 

where g is the chosen term expressed as a fraction of the whole. 

(IV) Finally the probability that the largest of the n terms should exceed g 
is, so long as this probability is small, naturally not far from n times the value 
given by (3), and has been shown to be exactly 

P = n (1 - -... + (-)’'- (1 - W-S (4) 

where k is the largest integer less than IJg. 



16.59 

59 


Harmonic Analysis, 


How good an approximation is obtained by using the first term only, is 
shown by the following table giving the 5 per cent, values of g for values of n 
from 5 to 50 in a parallel column with those obtained by ignoring all terms 
after the first. 


n. 

9 

(by exact formula). 

(by first 


0-68377 



0-44495 

^ 0-44495 

16 

0 - 33462„<i'^’''^ 

0-33463 

20 


0-27046 

26 

^^^2^806 

0-22813 

30 

o-nwisA 

0-19794 


0-1751?'''''*iiN^ 

0-17626 


0-16738 

0-16762 

46 

0-14310 

^■'^^.,..,^4324 

60 

0-13136 



This table can be used directly in testing significance ; the 5 per cent, point 
is the lowest level of significance likely to be wanted, and for higher levels, 
such as the 1 per cent, point, the first term will provide an even closer approxi¬ 
mation. The method of section 5 should be useful in many distribution 
problems involving points of discontinuity. 

The value of g may in all cases be very easily obtained. If all the Fourier 
submultiples have been worked out, it is, as already defined, 


H" ^2 + ••• "h 

The denominator of this expression is, however, merely 

2tl+l 

S (Uf — u)\ 

r=l 

In the case where the number of observations in the series is even, (2n -1- 2), 
we need still only consider the n complete harmonic terms, and can obtain 
their sum as 


2ii+2 

S 


-j2 _ («i — + «, — ... — 

291 -f- 2 



16.59a 


Significant Values for the Ratio of the Sum of Squares for the 
Most Significant Period N. (The Total for n Periods Obtainable from 
A Sequence of (2n -+■ 1) or (2n -f- 2) Successive Observations.) 


n 

5% 

1% 

n 

5% 

1% 

n 

5% 

1% 

5 

.68377 

.78874 

20 

.27040 

.32971 

35 

.17513 

.21338 

6 

.61615 

.72179 

21 

.26060 

.31783 

36 

.17124 

.20860 

7 

.56115 

.66440 

22 

.25155 

.30683 

37 

.16754 

.20405 

8 

.51569 

.61517 

23 

.24315 

.29661 

38 

.16400 

.19970 

9 

.47749 

.57271 

24 

.23534 

.28709 

39 

.16062 

.19554 

10 

.44495 

.53584 

25 

.22805 

.27819 

40 

.15738 

.19156 

11 

.41688 

.50357 

26 

.22123 

.26986 

41 

.15429 

.18776 

12 

.39240 

.47510 

27 

.21483 

.26205 

42 

.15132 

.18411 

13 

.37085 

.44982 

28 

.20883 

.25470 

43 

.14847 

.18060 

14 

.35172 

.42722 

29 

.20317 

.24778 

44 

.14573 

.17724 

15 

.33461 

.40689 

30 

.19784 

.24124 

45 

.14310 

.17401 

16 

.31922 

.38851 

31 

.19280 

.23506 

46 

.14057 

.17089 

17 

.30529 

.37180 

32 

.18803 

.22921 

47 

.13814 

.16789 

18 

.29262 

.35655 

33 

.18351 

.22366 

48 

.13579 

.16501 

19 

.28104 

.34257 

34 

.17921 

.21839 

49 

.13353 

.16222 

20 

.27040 

.32971 

35 

.17513 

.21338 

50 

.13135 

.15954 


Table of g; for testing the significance of the leading periodic component of a 
series of 2n 4- 1 or 2n. -+* 2 consecutive valucis. Kach of n components contributes 
a certain fraction to the total sum of squares, and g is taken to be the largest of 
these fractions. If this exceeds the corresponding tabulated value significant 
evidence of periodicity is indicated. 



17.502a 


17 

THE ARRANGEMENT OF FIELD 
EXPERIMENTS 


AUTHOR’S NOTE 

From about 1923 onwards the Statistical Department at Rothamsted 
had been much concerned with the precision of field experiments in 
agriculture, and with modifications in their design, having the dual 
aim of increasing the precision and of providing a valid estimate of 
error. 

These two desiderata had been somewhat confused in the minds of 
experimenters, and the present paper was the author’s first attempt 
at setting out the rational principles on which he might proceed. 
The paper is a precursor to the book on the Design of Experiments 
published nine years later. 


* Reprinted from Journal of Ministry of AgricuUurOy Vol. XXXIII, pp. 603- 
513, 1926. 



17.503 

This Arhahokmbnt of Fiklo ESxpbrimbntjS 503 


THE ARRANGEMENT OF FIELD 
EXPERIMENTS 

R. A. Fishbr, So.D., 

Rothamated Experimental Station, 

The Present Position. —The present position of the art of 
field experimentation is one of rather special interest. For 
more than fifteen years the attention of agriculturalists has 
been turned to the errors of field experiments. During this 
period, experiments of the uniformity trial type have demon¬ 
strated the magnitude and ubiquity of that class of error 
which cannot be ascribed to carelessness in measuring the 
land or weighing the produce, and which is consequently 
described as due to “ soil heterogeneity ** ; much ingenuity 
has been exx>ended in devising plans for the proper arrange¬ 
ment of the plots ; and not without result, for there can be 
little doubt that the standard of accuracy has been materially, 
though very irregularly, raised. What makes the present 
position interesting is that it is now possible to demonstrate 
(a) that the actual position of the problem is very much more 
intricate than was till recently imagined, but that realising 
this (6) the problem itself becomes much more definite and 
(o) its solution correspondingly more rigorous. 

The conception which has made it possible to develop a 
new and critical technique of plot arrangement is that an 
estimate of field errors derived from any particular experiment 
may or may not be a valid estimate, and in actual field practice 
is usually not a valid estimate, of the actual errors affecting 
the averages or differences of averages of which it is required 
to estimate the error. 

When is a Result Significant ?—What is meant by a valid 

estimate of error ? The answer must be sought in the use to 
which an estimate of error is to be put. Let us imagine in the 
broadest outline the process by which a field trial, such as 
the testing of a material of real or supposed manurial value, 
is conducted. To an acre of ground the manure is applied ; 
a second acre, sown with similar seed and treated in all other 
ways like the first, receives none of the manure. When the 
produce is weighed it is found that the acre which received 
the manure has yielded a crop larger indeed by, say, 10 per 
cent. The manure has scored a success, but the confidence 
with which such a result should be received by the purchasing 



17.504 

504 The Arrangement of Fielo Kxperiments 

public depends wholly upon the manner in which the experi¬ 
ment was carried out. 

The first criticism to be answered is—“ What reason is 
there to think that, even if no manure had been applied, the 
acre which actually received it would not still have given the 
higher yield ? ” The early experimenter would have had to 
reply merely that he had chosen the land fairly, that he hacL 
no reason to expect one acre to be better than the other, and 
, (possibly) that he had weighed the produce from these two 
[ acres in previous and.had never known them to differ 

by lO^ per cent. The last argument alone carries any weight. 
It will illustrate the meaning of tests of significance if we con¬ 
sider for how many years. the, produce should have been 
recorded in order to make the evidence convincing. 

First, if the experimenter could say that in twenty years 
experience with uniform treatment the difference in favour of 
the acre treated with manure had never before touched 10 per 
cent., the evidence would have reached a point which may be 
called the verge of significance ; for it is convenient to draw 
the line at about the level at which we can say : ** Fithcr 
there is something in the treatment, or a coincidence has 
occurred such as does not occur more than once in twenty 
trials.” This level, which we may call the 6 per cent, point, 
would be^ inciicated, though very roughly, by the greatest 
chance deviation observed in twenty successive trials . To 
locate the 6 per cent, point with any accuracy we should need 
about 500 years’ experience, for we could then, supposing no 
progressive changes in fertility were in progress, count out the 
twenty-five largest deviations and draw the line between the 
twenty-fifth and the twenty-sixth largest deviation. If t he / 
difference between the two acres in our experimental year \ 
exceeded this value, we should have reasonable grounds for 1 
calling the result significant. 

If one in twenty does not seem high enough odds, we may, 
if we prefer it, draw the line at one in fifty (the 2 per cent, 
point), or one in a hundred (the 1 per cent, point). Personally, 
the writer prefers to set, a low standard of significance at the 
5 per cent, point, and ignore entirely all results which fail to 
reach this level. A scientific fact should be regarded as experi¬ 
mentally established only if a properly designed experiment 
rarely fails to give this level of significance. The very high 
odds sometimes claimed for experimental results should usually 
be discounted, for inaccurate methods of estimating error 



17.505 

The Aruangement of Fiet.d Experiments 505 


have far more influence than has the particular standard of 
significance chosen. 

Since the early exjjerimenter certainly could not have 
proHuced a record of 600 years’ yields, the direct teSt^ of 
significance fails ; nevertheless if he had only ten previous 
years* records he might still make out a case, if he could claim 
that under uniform treatment, the difference had never come 
near to 10 per cent. His argument is now much less direct ; 
he wishes to convince us that such an error as 10 per cent, 
woiild occur by chance in less than 5 per cent, of fair trials, 
and he can only appeal to ten trials. On the other hand, 
for those ten years he knows the actual value of the error. | 
From these he can calculate a standard error, or rather an 
c'stimate of the standard error, to which the experiment is 
subject ; and, if the observed difference is many times greater 
than this standard error, he claims that it is significant. At 
how many times greater should he draw the line ? This factor 
(lepends on the amount of experience upon which the sCSfltlSfd 
error is based. If on ten values, we look in the appropriate 
j)ublishod table for “ the 5 per cent, value of <, when 
7^ — 10” and find (Ip. 137) the value 2*228. If, then, the 
standard error is only 3 per cent., the 6 per cent, point is at 
0*684 j>er cent., and we can admit significance for a difference 
of 10 per cent. 

If we thus put our trust in the theory of errors, all the 
calculation necessary is to find the standard error. In the simple 
case chosen above (in which, for simplicity, it is assumed that 
each of the two acres beats the other equally often) all that 
is necessary is to multiply each of the ten errors by itself, 
thus forming its square, to find the average of the ten squares 
and to find the square root of the average. The average of 
the ten squares is called the variance, and its square root is 
called the standard error. The procedure outlined above, 
relying upon the theory of errors, involves some assumptions 
about the nature of field errors ; but these assumptions are 
not in fact disputed, and have been extensively verified in 
the examination of the results of uniformity trials. 

Measurement of Accuracy by Replication. —It would be 
exceedingly inconvenient if every field trial had to be preceded 
by a succession of even ten uniformity trials ; consequently, 
since the only purpose of these trials is to provide an estimate 
of the standard error, means have been devised for obtaining 
such an estimate from the actual yields of the trial year. 




17.606 

500 ThB ArRANGSM£NT of FiSLiD Fxpfrimbnts 


The method adopted is that of replication. If we had 
challenged, as before, the result of an experiment performed, 
say, ten years ago, we should not probably have been referred 
to the experience of previous years, but should have learnt 
that each trial acre was divided into, say, four separate 
quarters ; and that the two acres were systematically inter¬ 
mingled in eight strips arranged ABB A ABB A, where A is the 
manured portion, and B the unmanured.* 

Besides affording an estimate of error such' intermingling of 
experimental plots is of value in diminishing the actual error 
representing the difference in actual fertility between the two 
acres. For it is obvious that such differences in fertility will 
generally be greater in whole blocks of land widely separated, 
than in narrow adjacent strips. This important advantage of 
rj^ucing the standard error of the experiment has often been 
confused with the main purpose of replication in^roviding an 
estimate of error ; and, in this confusion, types of systematic 
arrangement have been introduced and widely employed 
which provide altogether false estimates of error, because the 
conditions, upon which a replicated experiment provides a 
valid estimate of error, have not been adhered to. 

XSrrora Wrongly Estimated. —^The error of which an estimate 
is required is that in the difference in yield between the area 
marked A and the area marked B, t.e., it is an error in the 
difference between plots treated differently in respect of the 
manure tested. T he est imate of error afforded by the replicated 
trial depends upon differences between plots treated alike. 
An'^es^niate of error so derived will only be valid for its 
purpose if we make sure that, in the plot arrangement, pairs of 
plots treated alike are not nearer together, or further apart 
than, or in any other relevant way, distinguishable from 
pairs of plots treated differently. Now in nearly all systematic 
arrangements of replicated plots care is taken to put the 
unlike plots as close together as possible, and the like plots 
consequently as far apart as possible, thus introducing a 
flagrant violation of the conditions upon which a valid estimate 
is possible. 

One way of making sure that a valid estimate of error will 
be obtain ed is to arrange the plots deliberately at random, 

* This principle was employed in an experiment on the influcjnc© of 
weather on the effectiveness of phosphates and nitrogen alluded to by 
Sir John Russell (3). The author must disclaim all responsibility for the 
design of this exx>eriment, which is, however, a good example of its 
class. 





17.507 

This ARiiANO£MKistT or Field Expertmekts 507 

so that no distinction can creep in between pairs of plots 
treated alike and pairs treated differently ; in such a case 
an estimate of error, derived in the usual way from the 
variations of sets of plots treated alike, may be applied to test 
the significance of the observed difference between the averages 
of plots treated differently. 

The estimate of error is valid, because, if we imagine a large 
number of different results obtained by different random 
arrangements, the ratio of the real to the estimated error, 
calculated afresh for each of these arrangements, wilt Tie* 
actually distributed in the theoretical distribution by which 
the significance of the result is tested. Whereas if a grou^Tof 
arrangements is chosen such that the real errors in this group 
are on the whole less than those appropriate to random 
arrangements, it has now been demonstrated that the errors, 
as estimated, will, in such a group, be higher than is usual in 
random arrangements, and that, in consequence, within such 
a group, the test of significance is vitiated. It is particularly 
to be noted that those methods of arrangement, at which 
experimenters have consciously aimed, and which reduce the 
real errors, will appear from their (falsely) estimated standard 
errors to be not more but less accurate than if a random 
arrangement had been applied ; whereas, if the experimenter 
is sufficiently unlucky, as must often be the case, to increase 
by his systematic arrangement the real errors, then the 
(falsely) estimated standard error will now be smaller, and will 
indicate that the experiment is not less, but more accurate. 
Opinions will differ as to which event is, in the long mm, the 
more unfortunate ; it is evident that in b oth ca ses quite 
misleading conclusions will be drawn from the experiment. 

A Necessary I>istinction. —The important question will be 
asked at this point as to whether it is necessary, in order to 
obtain a valid estimate of error, to give up all the advantage 
in accuracy to be obtained from growing plots, which it is 
desired to compare, as closely adjacent as x^^ssible. The 
answer is that it is not necessary to give up any such 
advantage. Two things are necessary, however : (a) that a 

sharp distinction should be drawn between those components 
of error which are to be eliminated in the field, and those which 
are not to be eliminated ; and that while the elimination of 
the one class shall be complete, no attempt shall be made to 
eliminate the other ; (b) that the statistical process of the 

estimation of error shall be modified so as to take account of 





17.508 

508 The Arhanukment of Field Kxpertments 

the field arrangement, and so that the components of error 
actually eliminated in the field shall equally |;>e eliminated in 
the statistical laboratory. 

In reconciling thus the two desideraia of the reduciUm of 
error and of the vcdid estimation of error, it should be emphasised 
that no principle is in the smallest degree compromised. 
An experiment either admits of a valid estimate of error, or it 
does not ; whether it does so, or not, depends not on the 
actual arrangement of plots, but only on the way in which 
that arrangement was arrived at. If the arrangement 
ABBAABBA was arrived at by writing down a succession of 
“ sandwiches ’* ABBA, it does not admit of any estimate of 
certain validity, although “ Student ” (2) has shown reasons 
to think that by treating each sandwich ” as a unit, the un¬ 
certainties of the situation aie much reduced. If, however, 
the same arrangement happened to occur subject to the 
conditions that each pair of strips shall contain an A and a B, 
but that which came first shall be decided by the toss of a coin, 
then a valid estimate may be obtained from thq four differences 
in yield in the four pairs of strips. It is not now the 

sandwiches ” but the pairs of strips wlucB' pfS^l'de' ihfle- 
pentteixt"xinits bf these units are double the 

numb^bf the ** sandwichesr'’ 

Moreover, if the experiment is repeated, either by replication 
on the same field, or at different farms scattered over the 
country, the arrangement must be obtained afresh by chance 
for each replication, so that in only a small and calculable 
proportion of cases will the sandwich arrangement be 
reproduced. 

Thus validity of estimation can be guaranteed by appropriate 
methods of arrangement, and on the other hand there is 
reason to think that well-designed experiments, yielding a 
valid estimate of error, and therefore capable of genuine 
significance tests, will give actual errors as small as even the 
most ingenious of systematic arrangements. It is difficult 
to prove this assertion save by experimenting on the data 
provided by uniformity trials, because, in the absence of any 
satisfactory estimate of error, it is impossible to tell for certain 
how accurate, or inaccurate, such systematic arrangements 
really are ; while the aggregate of the uniformity trial data, 
hitherto available, is scarcely adequate for any such test. 
What can be said for certain is, that experiments capable 
of genuine tests of significance can easily be designed to be 




17.509 

Thk Arranobmbnt of Fibl.i> Bxfbbimekts 509 

very much more accurate than any experiments ordinarily 
conducted. 

A Useful ISethod. — The dia tinction between enors eliminated 
in the field, and the errors which are to be carefu1Iy"randomized 
ii^ order to provide a valid estimate of the errors wMch cannot 
be eliminated, may be made most clear by one of the most 
u^ful and flexible types of arrangement, namely, the arrange* 
ment in ** rando mized blocks.*’ Let us suppose that five 
different varieties are to be tested, and that it is decided to 
give each variety seven plots, making thirty-five in all. It 
would be a perfectly valid experiment to divide the land into 
thirty-five equal portions, in any way pleased^ and then 
to assign seven portions chosen wholly at random to each 
treatment. In such a case, as has been stated above, no 
modification is introduced in the process of estimating the 
standard error from the results, for no portion of the field 
heterogeneity has been eliminated. On most land, however, 
we shall obtain a smaller standard error, and consequently a 
more valuable experiment, if we proceed otherwise. The 
land is divided first into seven blocks, which, for the present 
purpose, should be as compact as possible ; each of these 
blocks is divided into five plots, and these are assigned in each 
case to the five varieties, independently, and wholly at random. 
If this is done, those components of soil heterogeneity which \ 
produce differences in fertility b^ween plots of the same block 
will be completely randomized, wbfie those components which 
produce differences in fertility between different blocks "will 
be. completely eliminated. In calculating an estimate of exror 
from such an experiment, care must of course be taken to 
eliminate the variance due to differences between blocks, 
and for this purpose exact methods have been develop^ 
(l.pp. 176-232). 

Most experimenters on carrying out a random assignment 
of plots will be shocked to find how far from equally the plots 
distribute themselves ; three or four plots of the same variety, 
for instance, may fall together at the comer where four blocks 
meet. This feeling affords some measure of the extent to 
which estimates of error are vitiated by systematic regular 
arrangements, for, as we have seen, if the experimenter 
rejects the arrangement arrived at by chance as altogether 
“ too bad,” or in other ways ” cooks ” the arrangement to 
suit his preconceived ideas, he will either (and most probably) 
increase the standard error as estimated from the yields ; 




17.510 

.510 The Arrangement of Field Experiments 


or, if his luck or his judgment is bad, he will increase the real 
errors while diminishing his estimate of them. 


The Latin Square. —^For the purpose of variety trials, and 
of those simple types of manurial trial in which every possible 
comparison is of equal importance, the problem of designing 
economical and effective field exx>eriments, reduces to two main 
principles (i) the division of the exx>erimental area into the 
plots as small as possible subject to the type of farm machinery 
used, and to adequate precautions against edge effect; (ti) the 
use of arrangements which eliminate a maximum fraction of 
the soil heterogeneity, and yet provide a valid estimate of the 
residual errors. Of these arrangements, by far the most 
efficient, as judged by experiments upon uniformity trial data, 
is that which the writer has named the Latin Square. 

Systematic arrangements in a square, in which the number 
of rows and of columns is equal to the number of varieties, 
such as 


ABODE 
E A B C I) 
D E A B C 
C D E A B 
B C D E A 


ABODE 
D E A B O 
B O D E A 
E A B O D 
O D E A B 


have been used previously for variety trials in, for example, 
Ireland and Denmark ; but the term ** Latin Square ” should 
not be applied to any such systematic arrangements. The 
problem of the Latin Square, from which the name was 
borrowed, as formulated by Euler, consists in the enumeration 
of every possible arrangement, subject to the conditions that 
each row and each column shall contain one plot of each 
variety. Oonsequently, the term Latin Square should only be 
applied to a process of randomization by which one is selected 
at random out of the total number of Latin Squares possible ; 
or, at least, to specify the agricultural requirement more 
strictly, out of a number of Latin Squares in the aggregate, 
of which every pair of plots, not in the same row or column, 
belongs equally frequently to the same treatment. 

The actual laboratory technique for obtaining a Latin 
Square of this random type, will not be of very general interest, 
since it differs for 6x6 and 6x6 squares, these being by far 
the most useful sizes. They may be obtained quite rapidly, 
and the Statistical Laboratory at Rothamsted is prepared to 
supply these, or other t 3 rpe 8 of randomized arrangements, to 
intending experimenters; this procedure is considered the 




17.511 

Tub Arranokmknt of Fiki^i> Expkrtments 511 

more desirable since it is only too probable that new principles 
will, at their inception, be, in some detail or other, misunder¬ 
stood and misapplied ; a consequence for which their 
originator, who has made himself responsible for explaining 
them, cannot be held entirely free from blame. 

Complex Experimentation. —Only a minority of field experi¬ 
ments ar<; of the simple type, typified by variety trials, in 
which all possible comparisons arc of equal importance. In 
most experiments involving manuring or cultural treatment, 
the comparisons involving single factors, e.gr., with or without 
phosphate, arc of far higher interest and practical importance 
than the much more numerous possible comparisons^ involving 
several factors. This circumstance, through a process of 
ivasoning, which can best be illustrated l)y a practical example, 
leads to the remarkable consequence that large and complex 
experiments have a much higher efficiency than simple ones. 
No aphorism is more frequently repeated in connection with 
field trials, than that we must ask Nature few questions, or, 
ideally, one question, at a time. The writer is convinced that 
this view is wholly mistaken. Nature, he suggests, will best 
respond to a logical and carefully thought out questionnaire ; 
indeed, if we ask her a single question, she will often refuse to 
answer until some other topic has been discussed. 

A good example of a complex experiment with winter oats 
is being carried out by Mr. Eden at Rothamsted this year, 
and is shown in the diagram. 

Nitrogenous manure in the form of Sulphate (S), or Muriate 
(M) of ammonia, is applied as a top dressing tarly, or late in 
the season, in quantities represented by O, 1, 2. When no 
manure is applied, we cannot, of course, distinguish between 
sulphate and chloride, or between early and late applications ; 
nevertheless, since the general comparison O versus 1 dose is one 
of the important comparisons to be made, the number of plots 
receiving no nitrogenous manure (corresjponding roughly to 
the so-called “ control ” plots of the older experiments) are 
made to be equal in number to those plots receiving one or 
two doses. This makes twelve treatments, and these are 
replicated in the above sketch in eight randomized blocks. 
Note what a “ bad distribution chance often supplies ; 
the chloride plots are all bunched together in the middle of 
the first block, while they form a solid band across the top 
block on the right ; in the bottom block on the right, too, 
all the early plots are on one side, an«l all the late plots on the 
other. 




17.612 

512 The Arrangement of Field Experiments. 


1^ 

2 M 
EARLY 

m 

ss 




1 5 

EARLY 

1 s 

EARiy 



BSB 

2M 

EARLY 


IM 

EARLY 

IM 

LATE 


SM 

lATE 

>< 

9 S 

EARIY 


i S 

lATE 

X 

2 S 

EARLY 

a s 

EARLY 

EM 

EARLY 


IM 

LATE 


as 

EARLY 


2M 

LATE 

^1 

i S 

LATE 

1 S 

EARLY 

IM 

EARLY 

IM 

LATE 



1 S 

LATE 

2M 

LATE 


9 S 
LATE 

9^ 

2M 

EARLY 



1 S 

EARLY 

9 S 

EARLY 

9M 

1 S 

EARLY 



2 S 

EARLY 





IM 

LATE 


■QS 

2M 

LATE 


■fi3i 

liAiM 


IM 

EARLY 


1 S 
LATE 



i s 

EARLY 

1 5 

LATE 


SM 

EARLY 

uSSM 

1^ 

1 s 

EARIY 



1 5 

LATE 

1 $ 

LATE 



mQs 

IBIS 

IM 

EARLY 

as 

EARLY 

2M 

LATE 


1 S 

EARLY 


95 

EARLY 



2M 

EARLY 

2 S 

LATE 

MEM 

I!5i3 


Fio. 1.—A Complex Experiment with Winter Oats. 


The value of such large and complex experiments is that 
all the necessary comparisons can be made with known and 
with, probably, high accuracy ; any general difference between 
sulphate and chloride, between early and late application, or 
ascribable to quantity of nitrogenous manure, c an be b a sed o n 
thirty- two comparisons, each of which is affected only by 
sucET TOil heterogeneity as exists between plots in the same 
block. To make these thrc^_Sfita_.of comparisons only, with 
the same accuracy, by single question methods, would require 
224 plots, against our 96 ; but in addition many other com¬ 
parisons can also be made with equal accuracy, for all com¬ 
binations of the factors concerned have been explored. 
Most important of all, the conclusions dra;^ from the single- 
ffujtor comparisons wiU be ^ven, by the_variation of non- 
emntial. eolations, a very "much wider inductive basis than 
could be obtained, by single question methods, without 
extensive repetitions of the experiment. 

In the above instance no possible interaction of the factors is 








































17.513 

513 


disregarded ; in other oases it will sometimes be advantageous 
deliberately to sacrifice all possibility of obtaining information 
on some points, these being believed confidently to be un« 
important, and thus to increase tlie accuracy attainable on 
questions of greater moment. The comparisons to be sacrificed 
will be deliberately confounded with certain elements of the 
soil heterogeneity, and with them eliminated. Some additional 
care should, however, be taken in reporting and explaining the 
results of such experiments. 

References. —(1) R. A. Fisher : StcUistical Methods for Research 
Workers. (Oliver & Boyd, Gdinburgh. 1925) ; (2) “ Student ** : On 
Testing Varieties of Cereals. (Riometrika, XV, pp. 271-293, 1923) ; 
(3) Sir John Russell : Field Rxperiments ; How They ai-e Made and 
"VVliat They are. {Jour. Min. Agric.^ XXXII, 1920, pp. 9S9-1001.) 




18 . 200 a 


i8 

STUDIES IN CROP VARIATION. VI. EXPERI¬ 
MENTS ON THE RESPONSE OF THE POTATO 
TO POTASH AND NITROGEN 


AUTHOR’S NOTE 

This paper is now of interest principally as showing the gradual de¬ 
velopment of ideas on experimental design appropriate to agricul¬ 
tural experiments. 


• Reprinted from Journal of Agricultural Science^ Vol. XIX, Pt. II, pp. 201— 
213, 1929. 



18.201 


STUDIES IN CROP VARIATION. 

VI. EXPERIMENTS ON THE RESPONSE OF THE POTATO TO 
POTASH AND NITROGEN. 

By T. EDEN, M.Sc., and R. A. FISHER, Sc.D. 

(Rothamated Experimental Station^ Harpenden.) 

(With Four Text-figures.) 

In a previous paper (i) the authors have described in detail the design, 
statistical calculations, and advantages of a method of field experi¬ 
mentation which, on its theoretical side, is based upon the analysis of 
variance (2). The method is capable of expansion and elaboration in 
several directions, and the purpose of this paper is to put on record 
three further examples of experiments in which the new technique has 
been employed. 

As before, it has seemed to us best to present the details of this 
method in the form of an actual description of experiments themselves 
rather than as abstract examples. Such a procedure has in this case an 
added advantage since the three examples chosen follow one another 
logically, and were each the result of a realisation of both the limitations 
and the advantages of prior attempts. A consideration of these trials 
will, it is hoped, enable the experimenter to appreciate the advantages 
of planning his experiments so as not only to embody an agricultural 
question, but also to ensure the most accurate decision possible. 

When the time was opportune for applying in practice some of the 
advances in experimental method then available, two of the most 
important investigations being carried out in the field at Rothamsted, 
were concerned with 

(1) the qualitative aspect of potash manuring, 

(2) the interaction of potash and nitrogen. 

The crop selected was the potato, and it was to these investigations 
that the new methods were first applied. In the first two years the 
variety was “Kerr’s Pink” and in 1927 “Arran Comrade.” The quali¬ 
tative aspect of the investigation gave rise to a design which may be 
designated Type I. 

Type I. 

The problem at hand was to design an experiment capable of dis¬ 
tinguishing the differential effects (if any) of potash applied as sulphate. 



18.202 

202 Sttidies in Crop Variation 

muriate^ and low grade salt {i.e, a sylvenite containing, in addition to 
potassium chloride, a high percentage of sodium chloride). T here were 
th^ three treatments, to which a fourth was added with no potash,) 
thk being in the nature of a safeguarding plot to ensure that an apparent 1 
equality in the efficacy of the three forms of potash was not in reality I 
due to non^efleotiveness and consequent lack of response. 

The necessity of a high standard of accuracy to distinguish between 
equivalent dressings of various forms of the same nutrient together with 
the smaller number of comparisons attempted, led to the adoption of 
the Latin square arrangement of plots. The principal features of the 
design have been described elsewhere (2) but may be repeated here. 

Each Latin square experiment contains as many replications as 
there are treatments. In each row and in each column of the square, 

■ ^ each treatment occurs once and once only. It is in this respect that it 
is a square, for in actual shape it may vary from a true square to a 
rather prbnoimced rectangle. The actual allocation of the position of 
any treatment within its row or column is, apart from this one restric¬ 
tion, determined by chance. It thus follows that for a 4 by 4 Latin 
square, for instance there are 676 alternative arrangements. An actual 
" ei^riment is a random choice from the total possible arrangements. 
Though not completely randomised with respect to plot arrangement, 
this design possesses complete r andomness with respect to the elemente 
d } of variation used in testing significance. Fig. 1 shows the actual arrange- 
ments employed during the two years of the trial. The details of the 
treatment are as follows: 

Basal manuring: Superphosphate 6 cwt. per acre; sulphate of ammonia 
2 cwt. per acre. 

Potash in the form of sulphate, muriate or low grade salt: The 
equivalent of 2 cwt. per acre of sulphate of potash. 

1926 1926 


M 

P 

O 

S 


M 

8 

O 

P 

444 

422 

173 

398 


6840 

667-0 

461-6 

408-5 

O 

S 

M 

P 


S 

P 

M 

O 

279 

439 

423 

409 


619-6 

486-6 

477-0 

380-0 

P 

M 

8 

O 


P 

O 

8 

M 

436 

428 

445 

212 


474-6 

378-6 

467-6 

491-6 

8 

O 

P 

M 


O 

M 

P 

8 

463 

237 

410 

393 


464-0 

611-0 

607-0 

402-0 


Fig. 1. Arrangement of two Latin squarea with yields in lb. per plot. 





18.203 

T. Eden and R. A. Fisher 203 

Applying the analysis of variance to this arrangement it is possible 
to obtain a variance due to: 


(1) Treatment, 



(2) Position. 




(3) Random variation of parallels. 



Table I. Total 

yields (1925). 


Rows 

Columns 

Treatments 


1 1437 

1 1612 

(O) No potash 

901 

2 1550 

2 1526 

(S) Sulphate 

1736 

3 1521 

3 1451 

(M) Muriate 

1688 

4 1493 

4 1412 

(P) Potash manure salts 1677 

Table II. Analysis 

of variance (1925). 



Degrees 



Variance due to 

of freedom 

Sums of squares 

Mean square 

Potash V. no potash 1 

119,700 

119,700 

Potash manures 

2 

475 

237 


— 3 

120,175 

40,058 

Rows 

3 

1,740 

680 

Columns 

3 

5,841 

1,947 

Parallels 

6 

1,996 

333 

Total 

15 

129,751 

— 


Table III. Total yields (1926). 


Rows 

Columns 

Treatments 


1 21010 

1 2042 0 

(O) No potash 

1693-0 

2 18710 

2 1932-0 

(S) Sulphate 

2036-0 

3 18120 

3 1913-0 

(M) Muriate 

2063 6 

4 1974 0 

4 1871-0 

(P) Potash manure salts 

1965-6 

Table IV. Analysis of variance (1926). 


I>egrec8 

Variance due to of freedom 

Sums of squares Moan square 

Treatments: 




Potash V. no potash 1 

20,254 

20,254 

Potash manures 

2 

1,278 

639 


— 3 

21,532 

7,177 

Rows 

3 

12,056 

4,018 

Columns 

3 

3,989 

1,330 

Parallels 

6 

2,066 

344 

Total 

15 

39,641 

— 


The positional variance is here of the utmost importance since it 
can be calculated in two directions, as variance of the rows— i.e. vari¬ 
ability from top to bottom, and variance of columns (variability from 
side to side). In this experiment the number of degrees of freedom 
assigned to each positional factor is three, and since any variation in 



18.204 

204 


Studies in Crop Variation 




column totals does not affect the variation in the row totals, the estimates 
of the variances of the two are independent, and for the purposes of 
elimination of positional variances are additive/ In other words, a 
considerable amount more of positional variance can be taken out by 
using this arrangement as a Latin square than by using it merely as 
four blocks of four treatments. The actual magnitudes of the variances 
are given in Tables II and IV. 

One point may be mentioned here, which, although not demon¬ 
strated by these analyses, is sometimes likely to arise. Treated as a 
Latin square, the data provide six degrees of freedom for the estimation 
of error. If the variance of either rows or columns had been sacrificed, 
there would however have been not six but nine available, and the 
sums of squares corresponding to the rejected three degrees of freedom 
would have been absorbed in the remainder sums of squares. The 
question will sometimes arise as to whether the gain effected by elim¬ 
inating the sums of squares of either rows or columns will counter- 
BkTance the loss entailed by having three less degrees of freedom with 
which to estimate the random error. In both the examples given it will 
be seen that even though in 1925 the positional variance of the rows, 
and in 1926 that of the columns, was small, the error is reduced by 
eliminating them from the error calculations. But, in cases where this 
is not so, the fact that the Latin square gives a larger error than blocks 
with only a one directional variance, has sometimes been held to imply 
a disadvantage. This is not altogether a fair criticism. One of the merits 
of the method is the recognition that the arrangement of plots has a real 
effect upon error estimation, and it makes use of that knowledge. If it 
were possible to say beforehand that in one particular direction, owing 
to the absence of a large component of soil heterogeneity in that direc¬ 
tion, the positional variance was negligible, then it would be advan¬ 
tageous to use the simpler block arrangement. In point of fact, it is 
seldom, if ever, with annuals, possible to rely on such a contingency, 
even where a preliminary uniformity trial has been carried out, and the 
j Latin square arrangement is adopted to make sure that the residual 
ivariance, on wMch hangs the precision of the experiment, shall not be 
ji^ated. from either source. When, in this way, certain elements of error 
I have been elimmateff lrom the field results, the statistician has no 


jchoice but to eliminate them in his estimate of error. To include a 
Iportion of these because they make his estimate smaller would be to 
nniss the point of making an imbiassed estimate. 


These experiments have one defect which in some cases may be 





18.205 

T. Eden and R. A. Fisher 205 

hard to overcome. The inclusion of the no-potash plot (for the reasons 
specified) does in a year of pronounced response to potash contribute 
very largely to the treatment variance. In applying tests of significance, 
particularly what is known as the z test, significanc e may be claime d i 
for treatment as a whole which is really due entirely to the one d egree I 

freedom potasTT versus’W potash. A further analysis of the variance# 
due to treatment by separating "this component, as in Tables II and IV, 
will, of course, settle the point, but the need of this precaution, and the; 
possibility of carrying it out, requires, at the present time, some em-^ 
phasis. 

The benefit of the elimination of the disturbing element of soil* 
heterogeneity is clearly seen in Tables II and IV, i^the positional vari-1 
ance is amalgamated with the random variance as it would be if thc_old 
methods of designing field experiments were followed. Table V show’s 
this advantage in terms of the standard error per cent., and of the 
precision figure based thereon. 

Table V. AdmnUxxje of eliminating a portion of the soil heterogeneity. 

Soil Soil 

variation With soil variation With soil 

eliminated variation eliminat«Ml variation 

Standard error 2-4 3-8 1*9 4*0 

i’reoision 17 7 27 6 

In 1925 soil variation would have increased the error by more than 
50 per cent, and in 1926 by more than 100 per cent. The influence on 
the accuracy of an experiment which such an increase of error entails 
is shown by the figure for precision. This value is arrived at as follows. 
On a ]irecision scale a 10 per cent, error, the approximate error of a 
single plot wuth many crops, is assigned the value 1 and a 1 per cent, 
error then has a precision value 100, this latter being the value at which 
with our present resources it is reasonable to aim. The precision index I 
w’ill then be 

'-‘“O’- 

where o is the standard deviation of the mean yi-^ld of each treatment, 
and m is the mean yitdd of all. 

Type II. 

The Latin square form of experiment answered admirably for the 
foregoing qualitative distinctions between potash nuniures, but was^ 
unsuited to the investigation into the quantitative rediaionships be' ween ^ 
potash and nitrogen. ^ .. ir- ,, v, 





18.206 

206 


Studies in Crop Variation 

To serve any useful purpose, this had to include several increments 
of potash with corresponding increments of nitrogen combined in as 
many ways as the size of the experiment would allow. To have carried 
out an experiment involving so large a number of treatments in the form 
of a Latin square, would have been very wasteful of space and effort. 
Past a certain point with the Latin square the increase in replication 
does not bring about a decrease in error commensurate with the labour 
involved. There are indications that comparisons of more than seven 
treatments or varieties can be made more precisely with other arrange- 
nmnts. Accordingly, when in 1926 twelve treatments were contemplated, 
positional variance was eliminated by assigning each treatment to each 
of four similar blocks, the arrangement of which was substantially 
the same as that of the top dressing series previously referred to (i). 
Within the blocks, however, the arrangement of the plots was not a 
random one. 


M 

N 

0 

R 

D 

A 

491 

328 

340 

508 

388 

322 

L 

J 

Q 

P 

T 

S 

437 

217 

487 

464 

272 

516 

P 

S 

R 

C 

N 

M 

450 

464 

461 

320 

298 

482 

T 

D 

A 

L 

Q 

J 

252 

352 

281 

438 

515 

315 

0 

P 

S 

M 

J 

N 

341 

439 

456 

466 

247 

344 

T 

L 

D 

R 

A 

Q 

226 

393 

338 

519 

198 

601 

M 

A 

N 

P 

D 

T 

449 

191 

185 

472 

342 

234 

Q 

R 

J 

L 

C 

S 

461 

475 

157 

377 

298 

441 


Block I 


Block II 


Block m 


Block IV 


Fig. 2. Quantitative experiment of 1925, yields in lb. per plot. 


The actual arrangement of the blocks as shown in Fig. 2 was deter* 
mined by the knowledge thftt^there is a high correlation between ad¬ 
jacent plots. 

In 1926 a similar experiment was carried out (Fig. 3). The actual 




18.207 

T. Edbn and R. a. Fishbb 207 


treatments involved in these trials are shown in the following plan where 


the letters indicate the treatments 

employed. 



» r -- 



1925 





1026 



Sulphate of 

Sulphate of potash, owt. 

Sulphate of 

Sulphate of potash. 

cwt. 

ammonia. 

^ - ■- 1 

.-A- 

— 

ammonia. 

4 - 


_A» 

—\ 

cwt. 

0 2 

4 

6 

owt. 

0 

1 

2 

4 

0 

A C 

D 

_ 

0 

A 

B 

C 

D 

2 

J L 

M 

— 

1 

E 

F 

O 

H 

4 

N P 

Q 

R 

2 

J 

K 

L 

M 

6 

— — 


S 

4 

N 

O 

P 

Q 


N 

J 

F 

A 

D 

O 

K 

A 

332-0 

302-5 

383-0 

317-6 

439-0 

533-5 

544-5 

404-5 

K 

Q 

O 

D 

L 

B 

F 

N 

444-5 

568-0 

450-0 

381-5 

483 6 

308-0 

434-0 

468-0 

B 

C 

M 

L 

H 

P 

G 

E 

3630 

368-0 

449-0 

471-6 

422-0 

500-0 

402-0 

318-0 

H 

E 

P 

O 

M 

Q 

C 

J 

447-5 

314-0 

627-0 

434-6 

604-0 

561-6 

366-0 

456-0 

A 

h i 

J 

C 

P 

Q 

B 

E 

351-5 

495-5 

443-0 

383-6 

659-0 

660-0 

369-0 

305-5 

K 

B 

O 

O 

C 

H 

J 

O 

472-5 

367-6 

455-5 

602-6 

328-6 

390-6 

483-0 

512-0 

E 

F 

Q 

D 

N 

M 

A 

D 

367-6 

381-5 

531-0 

316-0 

622-0 

444-0 

325-0 

269*0 

N 

H 

P 

M 

F 

O 

K 

L 

385-6 

354-0 

496-5 

474-5 

410-5 

361-6 

430-0 

394-5 


Fig. 3. Quantitative experiment of 1926, yields in lb. per plot. 


It was realised that the 1925 experiment was inadequately designed 
with respect to the treatments included, and the distribution of plots 
within the blocks, consequently in 1926, by using every possible 
combination and a strictly random arrangement, the experiment was 
greatly improved. The analyses of variance are set out in Tables VI 
and Vll and the plan of the arrangements in randomised blocks in 
Figs. 2 and 3. 


Table VI. Analysis of variance, 1925. 


Varianoe due to 
Treatment 
Blocks 
Parallels 


Degrees 
of freedom 
11 
3 
33 


Sums of squares 
464,251 
22,030 
34,285 


Mean square 
42,205 
7,343 
1,039 


Total 


47 


520,566 




18.208 

208 Studies hi Crop Variation 

Table VII. Analysis of variance, 1926, 

Degrees 

Variance duo to of freedom Sums of squares 
Treatment 15 261,497 

Position 3 11,303 

Parallels 45 97,361 

Total 63 370,161 

The 1926 results call for some explanation. For an experiment in 
which care has been taken to reduce to a minimum disturbing factors 
contributing to error, the errors are disconcertingly high. This can be 
traced to the very small amount of positional variance which has been 
eliminated. The variance due to position is largely caused by soil hetero¬ 
geneity, as is alsoThe fandom variance. The difference between the two 
lieiTn tHe fact tKat the former is due to systematic changes in fertility 
affecting whole blocks (inter-block variance), whilst the latter is sporadic 
in its incidence (intra-block variance). So long as the size of the block 
|is such that the changes of fertility whicFmust occur even iii one block 
lare ^stematic, the variation will be reflected in a large positional vari¬ 
ance which is all to the good. If, Kowever, the blocks get so large that 
within the blocks there is local heterogeneity which is not systematic in 
'.incidence, such heterogeneity will increase the remainder or random 
1 variance. The question as to how much soil heterogeneity variance 
[ makes its appearance in the one or the other sections into which the 
analysis of variance is divided, depends entirely upon the inter-relation 
of plot size with block size and the type of soil heterogeneity encountered. 

In the present instance it would appear that as only some 10 per 
cent, of the sum of squares contributable by soil fertility variation is 
assignable to systematic changes, the blocks have been too large to 
fulfil their function. Greater replication of smaller blocks woidd have 
improved the experiment. 

It will be noticed that every experiment of this t 3 q )0 really con¬ 
stitutes a sort of uniformity trial in addition to answering the normal 
agricultural purpose. From a number of experiments carried out on 
one field over a variety of seasons, a very much fuller knowledge of the 
behaviour of the field is obtained than could be gained from a similar 
series of the older type. 

Type III. 

The failure to realise the standard of accuracy desired in the 1926 
experiment led to further discussion of experimental design and the 
evolution of a further elaboration. The simple expedient mentioned 


Mean square 
17,433 
3,768 
2,164 



T. Eden and R. A. Fisher 


18.209 

209 


above of increasing the replication in order to ensure greater accuracy, 
was not possible on the potato crop. To have done so would have brought 
the number of plots in potatoes above the number which could be 
successfully harvested in the interval between maturity and the onset 
of bad weather. More plots would have rendered lifting either impossible 
or at any rate unsatisfactory from the experimental point of view. Any 
improvement to be effected had to be accomplished without a large 
increase in plot number, because of this very practical and relevant 
restriction. The difficulty was overcome by amalgamating the qualitative 
an^ quantitative trials. In 1926, these two totalled 80 plots, a liatm 
square of 16, and 4 blocks of 16. In 1927, the two investigations were 
combined in an experiment of 81 plots. In order to do this the quanti¬ 
tative side had to be cut down to three increments of nitrogen and 
potash, hut as will appear later there was a marked improvement of 
accuracy and a much greater fund of information available from the 
new design. 


4 0 

O 

2 1*4 
382-6 

2 0 
379-0 

4 P2 
381-0 

0 0 
.382-0 

2 0 
380-5 

0 1*4 
335-5 

2 M2 
389-0 


2 82 
421-0 

4 M2 
430-6 

4 P4 
396-0 

4 0 

41,3 6 

2 84 
424-5 

2 M4 
409-5 

4 84 
436-0 

0 0 
348-5 

4 84 
403 0 

0 0 
36t»-ri 

0 P2 
366-0 

0 82 
401-0 

2 M2 
4200 

0 M4 
364 0 

4 0 

.399-0 

4 82 
408-0 

0 P2 
.3.54-0 

2 P2 
40-1-r) 

0 82 
357-0 

4 0 
412-5 

2 P2 
408-6 

4 82 
438-6 

4 0 

428-0 

4 84 
412-0 

2 82 
411-0 

0 M2 
.361-0 

4 M2 
4400 

2 P4 
.323 6 

0 84 

1 ,362-6 

2 P4 
403-6 

2 0 
409-6 

0 M2 
360-6 

0 r4 
319-0 

2 0 
402 5 

2 M4 
369-5 

4 M4 
436 5 

2 0 
394.5 

0 0 
! .396-0 

4 M4 
466-6 

0 0 
366 6 

0 84 
395-5 

0 0 
.349 5 

4 0 
400-5 

4 P2 
3.58-5 

0 0 
337-6 

0 M2 
346-0 

4 0 
4400 

2 0 
446-5 

2 M4 
466 0 

4 P4 
405-5 

0 P4 
.3.33-0 

2 84 
405-0 

4 M2 
.390-5 

0 M4 
302-0 

2 0 
377-0 

2 82 
467-5 

4 82 
473 0 

0 P2 
395-6 

4 O 
411-5 

0 0 
361-5 

0 82 
344 0 

4 0 

369-0 

4 P4 
366-5 

4 P2 
3880 

2 84 
463 6 

2 M2 
474-0 

0 0 
411-6 

0 84 
401-5 

2 P2 
400-5 

2 0 
389 5 

4 M4 
4.36-0 


Fig. 4. Qualitative and quantitative experiment of 1927. 


The two numbers in the upper lino represent the quantities of nitrogenous and potassic 
manures, the kind of the latter used being indicated by S for potassium sulphate in cwt. 
per acre, M for potassium chloride containing equal amount of potassium, and P for the 
equivalent low grade salt. The lower numbers represent the yield in lb. of a plot of 
one-fortieth of an acre. 




18.210 

210 


Studies in Crop Variation 


The quantities of potash and nitrogen are shown below: 


Equivalents of sulphate of potash, owt. 

Sulphate of , -^ 

ammonia, owt. 0 2 4 


0 0 0 0 2 0 4 

2 2 0 2 2 2 4 

4 4 0 4 2 4 4 


and the arrangement in Fig. 4 where S, M, P indicate the source of the 
potash applied. 

These nine treatments constituted the block, and of such blocks there 
were nine in all. The potash had however to be divided out amongst 
the three kinds, sulphate, muriate and low grade; there being three 
plots receiving double and three single potash, one of each in each block 
were allotted to each kind. The manner of allotting these qualitative 
differences amongst the varying quantities of nitrogen requires detailed 
description. The actual position of a plot considered only as representing 
potash and nitrogen interactions was determined entirely by chance. 
The element of chance also operated largely in the disposition of the 
qualitative factor, but there was one restriction. The restriction pro¬ 
vided that any particular variety of potash manure should occur in the 
total of the nine blocks in conjunction with every amount of nitrogen 
three times. In every other way the distribution was at random. 

The amount of replication in this experiment varies with each factor 
or interaction of factors concerned. The number of independent com¬ 
parisons which can be made is thus summarised: 

Number of 


(1) Action of potash in varying quantites in combination with a 
standard quantity of nitrogen 

(2) Action of nitrogen in varying quantites with standard potash 
(quantitative) 

(3) Interaction of nitrogen and potash in every combination 

(4) Between kinds of potash 

(6) Differential response of kind of potash to quantity of potash 

(6) Differential response of kind of potash to quantity of nitrogen 

(7) Differential response of kind of potash to quantity of potash and 
nitrogen varjring simultaneously 

(8) Elimination of soil heterogeneity 


comparisons 

27 ^ 


27 ^ 

9 y 
18 v^ 


9 

9 


3 

9 


The experiments of 1925 and 1926 gave information on sections 
1-4, sections 5-7 are additional information and the accuracy of the 
comparison between kinds of potash is enormously enhanced, there 
being now 18 comparisons in place of 4. 

The advantage of this type of survey experiment is, as has been 
pointed out, very great. For each comparison an appropriate error is 



T. Eden and R. A. Fisher 


18.211 

211 


obtained with respect to which interpretation can be made. Considera¬ 
tion of the mean yields in conjunction with their appropriate errors 
shows how greatly improved is the standard of accuracy of the qualitative 
side of the trial. In 1925, although the means showed the apparent 
order of efficiency to be sulphate, muriate, low grade salts, even by 
taking the maximum difference sulphate versus low grade salts (a not 
entirely fair method), the differences were only probably significant and 
not completely so, and a similar state of affairs is seen in 1926 with 
respect to the greatest difference, muriate versus low grade salts. 

In the 1927 trial, a summary of which is shown in Table VIII, a much 
closer control is established and for double dressings the depression of 
the low grade salts is significant. The depressing effect of muriate rests 
as a probability and the results show that the effect is felt least where 
the dressings of nitrogen are high— i.e. where the manurial effect of 
potash would be more apparent. The sulphate appears to function 
normally at all values of nitrogen. 


Table VIII. Analysis of variance, 1927. 



Degrees 

Sums of 

Mean 

Varianoe due to 

of freedom 

squares 

square 

Potash and nitrogen 

8 

49,905 

6,238 

Quality of potash 

2 

14,458 

7,229 

Quantity v. quality of potash 

2 

1,005 

603 

Blocks 

8 

21,442 

2,680 

Error 

60 

33,919 

565 

Total 

80 

120,729 

— 


Average yield in tons per acre. 


Nitrogen, cwt. Potash, kind 


cwt. 

0 

2 

4 

S 

M 

P ' 

0 

6-546 

7-061 

7-168 

6-921 

6-921 

6-921 

2 

6-514 

7-532 

7-367 

7-383 

7-164 

6-858 

4 

6-193 

7-216 

7-435 

7-348 

7-037 

6-458 


Standard error 0-141 ton. 


The analysis of variance shows that differences of decided significance 
have been obtained both on the quantitative and qualitative questions. 
The table of average yields shows (i) the responses to increasing dressings 
of nitrogen, not very large in absolute amount, but capable of fair 
quantitative estimation in an experiment of the precision actually 
attained, (ii) a decided response to the first dose of potash, but much 
less, if any, to the second dose, (iii) that the second dose, when all three 
kinds of potassic manure are considered together, is deleterious in the 
absence of nitrogen, but probably becomes beneficial when the total 
yield is stimulated by heavy nitrogenous dressings. 



18.210 

210 


Studies in Croj> Variation 


The quantities of potash and nitrogen are shown below: 


Equivalents of sulphate of potash, owt. 

Sulphate of /-^^ 

ammonia, cwt. 0 2 4 


0 0 0 0 2 0 4 

2 2 0 2 2 2 4 

4 4 0 4 2 4 4 


and the arrangement in Fig. 4 where S, M, P indicate the source of the 
potash applied. 

These nine treatments constituted the block, and of such blocks there 
were nine in all. The potash had however to be divided out amongst 
the three kinds, sulphate, muriate and low grade; there being three 
plots receiving double and three single potash, one of each in each block 
were allotted to each kind. The manner of allotting these qualitative 
differences amongst the varying quantities of nitrogen requires detailed 
description. The actual position of a plot considered only as representing 
potash and nitrogen interactions was determined entirely by chance. 
The element of chance also operated largely in the disposition of the 
qualitative factor, but there was one restriction. The restriction pro¬ 
vided that any particular variety of potash manure should occur in the 
total of the nine blocks in conjunction with every amount of nitrogen 
three times. In every other way the distribution was at random. 

The amount of replication in this experiment varies with each factor 
or interaction of factors concerned. The number of independent com¬ 
parisons which can be made is thus summarised: 


(1) Action of potaah in varying quantites in combination with a 
standard quantity of nitrogen 

(2) Action of nitrogen in varjring quantitee with standard potash 
(quantitative) 

(3) Interaction of nitrogen and potash in every combination 

(4) Between kinds of potash 

(5) Differential response of kind of potash to quantity of potash 

(6) Differential response of kind of potash to quantity of nitrogen 

(7) Differential response of kind of potash to quantity of potash and 
nitrogen varying simultaneously 

(8) Elimination of soil heterogeneity 


Number of 
comparisons 


27^ 


27 ^ 

9 y 
18 ^ 
9 
9 


3 

9 


The experiments of 1925 and 1926 gave information on sections 
1-4, sections 5-7 are additional information and the accuracy of the 
comparison between kinds of potash is enormously enhanced, there 
being now 18 comparisons in place of 4. 

The advantage of this type of survey experiment is, as has been 
pointed out, very great. For each comparison an appropriate error is 



T. Eden and R. A. Fishbb 


211 


obtained with respect to which interpretation can be made. Considera* 
tion of the mean yields in conjunction with their appropriate errors 
shows how greatly improved is the standard of accuracy of the qualitative 
side of the trial. In 1925, although the means showed the apparent 
order of efficiency to be sulphate, muriate, low grade salts, even by 
taking the maximum difference sulphate versus low grade salts (a not 
entirely fair method), the differences were only probably significant and 
not completely so, and a similar state of affairs is seen in 1926 with 
respect to the greatest difference, muriate versus low grade salts. 

In the 1927 trial, a summary of which is shown in Table VIII, a much 
closer control is established and for double dressings the depression of 
the low grade salts is significant. The depressing effect of muriate rests 
as a probability and the results show that the effect is felt least where 
the dressings of nitrogen are high^— i.e. where the manurial effect of 
potash would be more apparent. The sulphate appears to function 
normally at all values of nitrogen. 


Table VIII. Analysis of variance^ 1927. 



Degrees 

Sums of 

Mean 

Variance due to 

of freedom 

squares 

square 

Potash and nitrogen 

8 

40,905 

6,238 

Quality of potash 

2 

14,458 

7,229 

Quantity t>. quality of potash 

2 

1,006 

503 

Blocks 

8 

21,442 

2,680 

Error 

60 

33,919 

565 

Total 

80 

120,729 



Average yield in tons per acre. 


Nitrogen, cwt. Potash, kind 

Potash, , -^^ ^ 

cwt. 0 2 4 S M P 


0 

6-645 

7-061 

7-158 

6-921 

6-921 

6-921 

2 

6-514 

7-532 

7-367 

7-383 

7-164 

6-858 

4 

6-193 

7-216 

7-435 

7-348 

7-037 

6-458 


Standard error 0-141 ton. 


The analysis of variance shows that differences of decided significance 
have been obtained both on the quantitative and qualitative questions. 
The table of average yields shows (i) the responses to increasing dressings 
of nitrogen, not very large in absolute amount, but capable of fair 
quantitative estimation in an experiment of the precision actually 
attained, (ii) a decided response to the first dose of potash, but much 
less, if any, to the second dose, (iii) that the second dose, when all three 
kinds of potassic manure are considered together, is deleterious in the 
absence of nitrogen, but probably becomes beneficial when the total 
yield is stimulated by heavy nitrogenous dressings. 



18.212 

212 Studies in Crop VaricUion 

The table showing the three kinds of potash separately is of special 
interest in providing unequivocal confirmation of the conclusions in¬ 
dicated, but without sufficient statistical significance, by the earlier 
experiments; all levels of nitrogenous manuring are here thrown to¬ 
gether. With sulphate we have a decided increase from the first dose, 
and no appreciable decrease due to the second dose. With muriate the 
yield with double potash is about midway between those obtained with 
none and with single potash; while with a source of potash which con¬ 
tains much additional sodium chloride, the first dose has on the average 
no appreciable effect, while the second dose produces a decided loss of 
yield. 

If we contrast the yields at the same level of abundance of potash, 
we find sulphate beating muriate by 0*22 ton at the single level, and by 
0*31 ton at the double level; while it beats the potash manure salts by 
0*526 at the single level, and by 0*890 at the double level, the difference 
being two and one-half to three times as great. It is clear that we must 
interpret these results, not as due to any difference in availability of 
the potash, but as due to other effects, presumably the presence of 
chloride, which effect a quantitative depression of the yield nearly 
proportional to the quantity of chloride present. The use of no-potash 
plots designed to show that the crop is really ready to respond to avail¬ 
able potash, while essential if availability is in question, are quite 
superfluous for the examination of effects of this kind, which are most 
clearly seen with the second dose of potash, to which there is in the 
present experiment no appreciable response. 

Summary. 

The development is recorded of the series of experiments with potatoes 
at Rothamsted during 1925-27, designed to examine the quantitative 
response of yield to varying quantities of nitrogenous and potassic 
manures, and to test the relative value with this crop of different sources 
of potash. 

While rather precise comparisons were obtained on the qualitative 
question by means of Latin squares in 1925-26, the reality of the depres¬ 
sion ascribable to chloride could not be demonstrated in these years, 
but became clearly apparent when in the following year, the qualitative 
experiment was merged with the quantitative one. 

In the earlier quantitative experiments, although satisfactory re¬ 
sponses were obtained, the precision of the results left much to be 



T. Edbn and R. a. Fisher 


18.213 

213 


desired, since only four replicates could be used. When by merging the 
experiments this was increased to nine replicates, much smaller re¬ 
sponses were clearly measurable. 

The large and complex type of experiment finally adopted thus 
supplied more precise information on both heads than could previously 
be obtained, and in addition to a more thorough exploration of the 
different combinations possible. 

REFERENCES. 

(1) T. Eden and R. A. Fisher. *^Tho experimental determination of the value of top 

dressings with cereals.** J. Agri. 8ci. (1927) 17, 548-562. 

(2) Fisher, R. A. Statistical Methods for Research Workers. Oliver and Boyd, Edin¬ 

burgh (1926). 2nd edit. 1928. 



19.204a 


19 

THE DISTRIBUTION OF GENE RATIOS FOR 
RARE MUTATIONS » 


AUTHOR’S NOTE 

The two subjects of this paper which deserve attention are (i) its 
bearing upon evolutionary theory, and (ii) the mathematical treat¬ 
ment of a class of functional equations. 

In 1922 the author had attempted to examine whether the geneti- 
cal situation indicated somewhat vaguely by observational data with 
quantitative characters could be more fully elucidated on the basis 
of the theoretical concepts of genetics. It was first necessary to 
form an opinion on the distribution of the gene ratios of factors ex¬ 
posed to different selective conditions, and then to ascertain their 
respective contributions to the genetic variance, and to selective 
progress. 

The mathematical treatment in 1922 left much to be desired, since 
on reflection it appeared that more searching questions could be 
asked, and especially the probable progress of new mutations could 
be traced statistically, making the distinction between the frequency 
of the new gene and that of the old. The numerical error to which 
attention is called in the first section of this paper is of little conse¬ 
quence, since the corresponding distribution is unchanged. It is not 
on this point that I have differed from Professor Sewall Wright, but 
in that I do not share his conviction that evolutionary progress is 
favoured by the subdivision of a species into small, imperfectly iso¬ 
lated populations, save in the case stressed by Darwin in which the 
environmental conditions of these are sufficiently diverse to induce 
divergent evolutionaiy tendencies. Wright, on the other hand, has 
maintained that random siuwival in such populations leads to the test¬ 
ing of a greater variety of genotypes, and to the more rapid discovery 
of successful combinations, while my own studies have not led me 
to believe in any such effect, as a factor contributing to organic 
evolution. 

* Reprinted from Proeeedinga of the Royal Society of Bdinburght VoL L, Pt. U, 
No. 17, pp. 205-220, 1030. 



19.204b 


To mathematicians the chief interest of the present paper lies in 
the treatment of the functional equations which arise in the exact 
examination of the terminal distributions, in the three cases consid¬ 
ered, namely, (i) the steady state without mutation or selection, (ii) 
equilibrium with mutations but without selection, (iii) the equilib¬ 
rium distribution for mutations having very small selective effects. 
With these distributions established, the probabilities of mutations 
of different classes establishing themselves, and their contribution to 
the frequencies at given gene ratios and to the heritable variance, are 
all calculable. The evolutionary consequences are developed in 
Geneticcd Theory of Natural Selection (Oxford, 1930), which is based 
for these questions on the present paper. 



19.205 


XVII.— The Disteibution of Gene Ratios for Rare Mutations. By 
R. A. Fisher, So.D., F.R.S. (Rothamsted Experimental Station, 
Harpenden, Herts). 


1. Introductory. 

In 1922 the author published a short paper, ** On the Dominance Ratio,” 
in the Proceedings of the Royal Society of Edinburgh (vol. xlii, 
pp. 321-341). Among other results, the conclusion was drawn that in 
the total absence of mutations and of selective survival, the quantity of 
variation, the variance, of an interbreeding group would decrease by 
reason of random survival, at a rate such that the ** time of relaxation ” 
was 4n generations, where n is the number breeding in each generation. 

The variance after the lapse of T generations was found to be 
proportional to 

During last year Professor Sewall Wright of Chicago has been good 
enough to send me in MS. an investigation in which, while confirming 
many other conclusions of my paper, he arrives at a time of relaxation 
of only 2n generations. Both periods are in most species so enormous 
that they lead to the same conclusion, namely, that random survival, 
while. of great importance in conditioning the fate of an individual 
mutant gene, is a totally unimportant factor in the balance of forces by 
which the actual variability of species is determined. Nevertheless it 
will, I hope, minimise the confusion which every error is liable to cause 
if I put on record at once my acceptance of Professor Wright’s value, 
and at the same time eradicate the error of my previous work by 
giving a more rigorous and comprehensive treatment of the whole subject. 
I may say that the previous conclusions as to the interpretation of the 
evidence for Mendelian dominance in the factors contributing to human 
variability are untouched, but that the r61e of mutations in maintaining 
the current genetic variability of a species may now be set in a much 
clearer light. 

The error to be corrected lies in the derivation (p. 326) of the 
differential equation satisfied by the distribution of the frequency ratios 
of different factors, when none are subject to selective action. If the 



19.206 


206 


R. A. Fisher, 


two alternative genes in any locus appear in the ratio p : q, the variance 
of p after one generation of random breeding will be 

2n* 

where n is the number breeding in each generation. To avoid the 
inconvenience that this variance is a function of p, we may write 
2p 1 - coa 2<7 = 1 4- cos 6 

when 

and the variance of B is therefore very nearly constant at the value 1 /2'ii. 

Although, n being large, the values of $ after one generation of 
random breeding will be well represented by a normal distribution with 
constant variance, yet its mean will differ from zero by an amount of 
1/n. This was overlooked in the previous treatment; to find the 
mean of SB as far as terms in we may write 


8$^ 




then since the mean value of Sp is strictly zero, while that of (Spy is 

pql^riy the mean value of SB is seen to be 

1 - 2a» 1 ^ ^ 

- --cot $. 

Sn -Jpq 

This, of course, with values of ti. of many millions, is an exceedingly 
small quantity, but its effect is not negligible for the discussion required, 
for if 

df^yde 

is the distribution of the values of B for different factors, the flux past 
every value of B due to random reproduction in one generation is changed 
from 

_ By 
4n dO 
to 

y ti ^ ^ 

~ dS' 

and the differential e(|uation to be satisfied by y becomes 
av 1 f a 

ai'" 4n I gtf a«» / ’ 


( 1 ) 


instead of 


ST“ 


4n 0^2 * 


the equation previously obtained; in both T is measured in generations. 



10.207 


The Distribution of Gene Ratios for Rare Mutations. 207 


2. The Solution for Steady Decay. 


It so happens that the function of 6 which satisfies the true equation 
in the case when, in the absence of mutations, the variance is steadily 
decaying owing to chance extinctions at the termini 0=^0, 0 = w, is the 
same as the corresponding solution of the original erroneous equation, 
namely, y «= A sin 0. 

Substituting in the true equation we have 


or 

in place of 


9 ^ . 


2A sin 0 
in * 


A — A^e-T/^n 

A-Ao«-t/4ii 


originally obtained. This confirms the value of 272. generations for the 
time of relaxation, found by a quite independent method by Professor 
Wright. The variance will then be halved by random survival in 
271 log 2 a* 1*471 generations. The immense length of this period for most 
species shows how trifling a part random survival must play in the 
balance of influences which determines the actual variability. 


3. Variability Maintained Constant by Mutations in the 
Absence of Selection. 

If in equation (1) we put di//dT equal to zero, we may at once 
integrate the right-hand side in the form 

^ + y cot 0 — - iwB, 


where B is the net number of factors in each generation, the gene ratios 
of which flow past any specified value of 0, and the differential equation 
now simply represents the fact that this flux is the same for all values 
of 0. The equation may now be integrated giving the primitive, 


or 


y sill 0 A + 4nB cos 0 

y — A cosec 0 +4»B cot 0. ..... (2) 


If we make the convention that mutations are equally frequent in 
supplying factors with 0 near to zero and in supplying factors with 0 
near to tt, the symmetrical solution 


y = A cosec 0 



19.208 


208 R. A. Fisher, 

will be appropriate; but, if we suppose all mutations occur at 0 = 0 , 
then y should tend to zero at 0 ='ir, and the appropriate form is 

y =» 4nB (cosec 0 +cot ^). ..... (3) 

In either case the integral of y to the limit of its range at 0 = 0 
fails to converge, so that the relation between the number of factors 
maintained and the rate of mutation cannot be made out without an 
investigation of the terminal conditions. Before passing on to consider 



-8 -6 -4 -2 0 2 4 6 8 


VALUES OF 2 

Fio. 1. —Frequency curves of logarithmic gene ratio, lor diirerent levels of selective advantage; 
note that the frccjuency ordinate is hight^st for the most extreme admissible negative values 
of 2 , and remains nearly constant over a range which is extremely sensitive to small selective 
intensities. 


these, we may consider the distribution now obtained as a distribution 
not of 0 but of the more convenient variate s = log ( 7 )/^), the logarithmic 
gene ratio. The frequency distribution (3) may be represented on the 
scale of s by noting that 


t/f = 4«B (co.<5ec ^ -h cot 
=- \n\l<i(iz 


— 4?ill j 


<iz 

+ 


This frequency distribution is illustrated by curve B in tig. 1 . The 
frequency ordinate is nearly constant for values of z less than —4, at 






19.209 


The Distribution of Gene Ratios for Rare Mutations. 209 

which point the mutant gene occupies nearly 2 per cent, of the available 
loci; it falls to half its previous value when z is raised to zero, when 
50 per cent, of the loci are occupied by mutant genes. For higher gene 
ratios still, the frequency falls rather rapidly to zero. Since the frequency 
ordinate is nearly constant for high negative values of z, the total 
frequency maintained depends on how far the curve may be carried to 
the left, or how large (negative) values the logarithmic frequency ratio, 
z, may have. Evidently this will depend on the size of the population, 
and an exact treatment will evidently require an examination of the 
terminal conditions. 

4. Distributions Expressed by Functional Equations. 

A very powerful method of approach was indicated, but not utilised, 
in the previous paper. If 

• • • 

are the probabilities of an individual gene carried by a member of the 
species, being represented in the next generation in 0, 1, 2, . . . offspring, 
we may define a function 

for values of x between 0 and 1, and it has been shown that to consider 
the offspring of two individuals instead of one, we have only to substitute 

{/{x)y for /{X). 

Consequently, if the number of factors in which the rarer gene 
occupies 1, 2, 3, . . . loci are given by tt^, tt,, ttj, . . ., an^l if 

<ft(x) ^ TT^X + + TT^a^ + • • 

the effect upon 0 of random breeding for one generation is to substitute 

MA^)} 

In practice we shall require to use the form 

and if we first take the case of extinction of genes without mutation, the 
distribution of gene frequencies, which maintains its form, while one 
factor is extinguished in each generation must satisfy the functional 
equation 



19.210 


210 R. A. Fisher, 

for the distribution being symmetrical, half the extinctions may be taken 
to be reductions from 1, 2, 3, . . . loci to zero, and half to be increases 
to 271. 

The corresponding equation for the generating function 0, for the 
case of a distribution in equilibrium with mutations at the rate of one 
in each generation, is 

- 4,(x) - 1 - ar, 

for a mutation may be represented as an increase of unity in the number 
of genes occupying one locus only, and a corresponding decrease of the 
(indefinite) number occupying no loci. 

The solutions of these equations will be shown to correspond with the 
solutions of the differential equations obtained above, and to admit in 
addition of an investigation of the terminal condition. 

5. The Function u„. 

In order to solve the functional equations, we define a function u„ of 
a single real variable v, which shall satisfy the condition 

starting from the arbitrary value Uq — O. The values of Uj, Ug, . . . may 
now be obtained by direct substitution, and these evidently tend to unity 
as a limit. To obtain a form for large values of v, we may put 

1 

Vy= r- 

and obtain 

1 I 1 1 B. 

^*'+1 - 1 _ 2 12t;, 72bV*'6! tv'"'*" ’ ’ * 

Where Bg, etc,, stand for the Bernoulli numbers 

U 1 B - 1 R - B _ 691 

It appears from the recurrence formula of v that when v is large a 
first approximation is given by 

and substituting this in the third term of the expression, we obtain the 
second approximation 

the error of which will tend to a finite limit, as v tends to infinity. 
Equally 




19.211 


The Distribution of Gene Ratios for Rare Mutations. 211 

must tend to a constant value as v is increased indefinitely. Let — C stand 
for this constant, and let 

- V + log r + C, 

then we may obtain an expansion for w in inverse powers of v\ for 
the recurrence formula provides that 

+ -y^,+ • •) 

■ ■ ■} 

and expanding this expression we obtain, dropping the suffix of v, 

tr^ _ ^r•* _ v-^ v~^ ^ ^ 1473»“* v“® 

144 7^ 24.720'42.720 ~ 1512.720 1680.720 336.720* “ 924.72.720 “ 

as an expansion of — the first term shows that the leading 
term in the expansion of w is l/72v for 

r,. 2<;«^6*;» 24i;*^ * ' ’ 

and similar expansions may be obtained for — and so on. We 
thus obtain 

tr-i v-2 v-3 711;-“ 8759»;-5 

72 1080 “ 108.144 “ 168.722 630.720® 

31v-« 1637 ?^-T 20879093!;-8 

8l . 720®’^ 1008.720® 9504.840.72oa’ 

While the last three coefficients are all less than 10”®, they show no 
such a decided tendency to decrease as would justify our evaluating the 
constant C by putting v — \, »/ = 0, a substitution which shows C to exceed 
by unity the limit of the sum of the coefficients. We may, however, use 
the larger values of v found by the recurrence formula for somewhat 
higher integral values of i/. 

For example, at v~5, the last three terms in w are less than 10“®, so 
that w will not be much in error in the ninth place of decimals; u is 
found by direct substitution in the recurrence formula to be *73192 31844 
and 

V - J log V- + 1*01464,8607 

gives a value of C nearly right to the last figure. To improve much upon 
this, it would be necessary to work to more than 10 places in the calcu> 
lation of u. As a check, working to 14 places up to u^q, where the 
last term retained in \v is about 2 x 10“^*, the value was found to be 
1*01464 86071 7, a value which shows that the apparent precision attained 
by the series is not illusory. 



19.212 


212 


R. A. Fisher, 


6. Solutions of the Functional Equations. 
If, in the equation 


we substitute 




we have 

“ i , 

which is satisfied if tp is the same function of x as Ji/ 
know that 


1 

1 - w 


+ ilog(l-«)-C + 


1 - U ( 1 - 7iy 
72 1080~ * 


is of -It. 


But we 


hence, apart from a finite fraction of the frequency, <p may be expanded 
in powers of x in the form 


5 11 2 17 , 

• • • 


( 4 ) 


sho>Aing that in the distribution of gene ratios appropriate to steady 
extinction without mutation or selection, the frequency of factors repre¬ 
sented in k loci must, when k is large, tend to unity. Since each step 
increases the gene proportion p by l/2n, we have, apart from the extremes 
of the distribution, 

d/ — 2ntip 
— V sin dd$^ 


in agreement with the solution obtained for this case from the differential 
equation. The total number of factors at all frequencies will be 
2n-J(y + log2a) - 01464,86071,7, 

(where y is Euler’s constant 0577215664), the remainder of which is 
negligible compared with the first term, twice the number of individuals 
breeding in each generation, thus verifying the rate of decay to be 1 in 2n 
in each generation. 

The exact treatment of the terminal frequencies, which shall account 
for the distribution of the finite quantity 0'014649, omitted from expres¬ 
sion (4), evidently requires the differential coefficients of J»/ with respect 
to at the value 'M. = 0. Since the series for w in powers of (1—u) is 
itself doubtfully convergent at this value, its differential coefficients may 
be still less relied upon to converge; we therefore require reduction formulas 
for these coefficients. 

From the recurrence formula 

- 1 - log Mh- 1 , 



19.213 


The Distribution of Gene Ratios for Rare Mutations. 213 

we have, differentiating with respect to 

1 du^i 
•tv- dv 

or 

•iv dy 

from which the value of dv/du for a lower value can be obtained with the 
same relative precision as at the higher. 

We may write the relation in the form 

with the understanding that any suffixes differing by unity can be sub¬ 
stituted for those indicated. Since also 

d d 
duQ ^du-^ 

we can at once derive the further relations 

1^0 15mj*i'j"+ 25ujVj^^^+ 

Wjv/4- 90«jVj"^+ 

and so on. 

From these it is evident that, knowing the series of differential 
coefficients of v with respect to u at any integral value such as i/ = 5, the 
corresponding series may be obtained step by step down to v = 0. In this 
way we obtain, for the aeries of coefficients 

1 

2 ’ Arl 

the values: 


k. 

True Value. 

Approximation. 

Error. 

Remainder. 

1 

•818,202,78 

•833,333,33 

+ 015,130,55 

-•000,481,94 

2 

•916,762,37 

-916,666,67 

- 000,096,70 

- 000,386,24 

3 

944,923,44 

*944,444,44 

- 000,479,00 

+ -000,092,76 

4 

‘958,266,12 

•958,333,33 

+ •000,067,21 

+ 000,025,65 

5 

•966,634,08 

•966,666,67 

+ -000,032,59 

- 000,007,04 

6 

‘972,225,35 

•972,222,22 

- -000,003,13 

- -000,003,91 


The table shows in parallel columns (i) the values derived from the 
reduction formula from those at i/ = 6. (ii) the values given by the approxi¬ 
mation 1 — 1/6A;, (iii) the differences between these values, (iv) the remainder 





19.214 


214 


R. A. Fisher, 


of the deviations needed to make up the total +‘014,648,61. The mere 
fact that this difference decreases at every step, and is finally reduced to a 
very trifling value, indicates that the errors shown in the first six terms, 
small as they are, are far greater than those to be anticipated at higher 
values of k. 

The second functional equation, appropriate for variability maintained 
by a constant supply of mutations, has the form 

- <^(jr) = 1 - X ; 

substituting here = we have 

- </>{«„) = 1 - ; 

but from the recurrence formula 


hence 


(I .lid 


SO the functional equation may be written 

+ log + log J^yyy 

which is satisfied if 


«^(m) -4- log 


fin 

(iv 


is a constant. Since by its definition </>(0) = 0, we thus find that 0 is the 
same function of a: as 

log v‘ - log 

is of 'ii. The approximate form, 

2 

gives 

log — log 2 — 2 log (1 —n) ', 
so that an approximation is given by 

</>(;<•)= -2 log (1+ 
which will account for the whole frequency save for 
log 2 - log Vq = *200,645,07. 

The frequency at p — kj2n is now found to be 2//c when k is large, or the 
frequency element to be 

2dp _ s\r\ 6d$ ^ (1 ■hc()s&)d^ 
j) ~ 1 - cos B sin $ 

— (cosec B 4- cot B)dBy 



19.215 


The Distribution of Oene Ratios for Rare Mutations. 215 


thus confirming the solution obtained by means of the differential equation. 
By the present method, however, we can evaluate the total number of 
factors maintained in the specific variance by one mutation in each 
generation as 

2(y + log 2n) 4- -200,646,07, 


the value of which ranges from 30*372 to 57*903 as n changes from 10^ to 
10 »* 

The exact terminal frequencies for this case may be obtained from 


hence 




log + 


which, on expansion in powers of u, yields the frequency coefficients of the 
following table : 


k. 

Truft Value, 

Approximation. 

Error. 

Remainder. 

1 

2-240,917,26 

2-000,000,00 

- -240,917,26 

+ 040,272,19 

2 

-963,776,16 

1-000,000,00 

+ *046,223,84 

- 006,951,65 

3 

-671,863,62 

•666,666,67 

- 006,196,95 

- 000,764,70 

4 

•601,095,71 

•600,000,00 

- •001,096,71 

+ •000,341,01 

6 

•399,761,71 

•400,000,00 

+ 000,238,29 

+ 000,102,72 


showing, as in the previous case, that the discrepancy from the approximate 
formula is confined, for all practical purposes, to the extreme terminal values. 


7. The Effects of a Small Selective Advantage 
OR Disadvantage. 

The method of functional equations has now made clear in what way 
the terminal forms of the solutions of the differential equations should be 
interpreted; we may therefore now consider the differential equation appro* 
priate to mutations enjoying a small selective advantage, such supplying in 
all probability the greater portion of the genetic changes taking place in 
the course of evolution. 

If a is the selective advantage of the mutant genes, the fiux past any 
value of 6 may be written £is 

I.(6) 

provided a* may be neglected. It should be noted that the equation will 
only be correct if is a small quantity, and this limits its application 





19.216 


216 


R. A. Fisher, 


to very minute selective intensities. For these, however, the equilibrium 
condition of constant flux yields a differential equation for y of the first 
order, which may be written 

- (2a7* sin ^ - cot B)y = - ianA 
and may be integrated in the form 

yg2«n CO. 0 Q „ 2A?2«n + B. 

Since cos ^ = — 1 when the condition that at this terminus, where 

no mutations are occurring, y sin 0 should be zero, is that 

giving the solution 

y ~2A cosec ^(1 - ®>). 

At the terminus 6 = 0 this will correspond to the distribution in equi¬ 
librium with one mutation per generation if 

2 

1 -«-4an» 


so that the distribution adopted is 

tlf^ydO =» 4 cusec Q 


1 _ g«-2an(l+CO. ») 


Fig. 1, C, shows the distribution on the scale of z for aw = l, while 
the curve A on the same figure shows the curve for factors at a minute 
selective disadvantage, a?fc=—1. 

While the curve of continuous distribution represents the frequencies 
well over that part of the range in which is considerably less than 
log n-, the termini of the distribution are subject to adjustments similar 
to those investigated in the absence of selection. Thus at the terminus 
6 = 0, the frequency clement 4 cosec 6tf6 will be replaced by a series of 
frequencies for 1, 2, 3 genes given approximately by the series 2, 1, 
while at the terminus 6 = 7r we have the frequency element 


8an cosHC 6(1 + cos 6)^^^ 

I _«-4an ' ’ 

the limit of which is 

4ari sin 6>W 

~T - 


the form appropriate to steady extinction without mutation, the rate of 
extinction at this terminus being 

2a 

I _ 



19.217 


The Distribution of Gene Eatios for Rare Mutations. 217 


in each generation; this rate may equally be obtained by substituting 
in the assumed dux of factora, Aa, the solution 

A_ 1 - 

1 

The probability of a mutant, enjoying a small selective advantage a, 
spreading until it establishes itself throughout the entire population is 
thus found to be 2a/(l —; it is easy to see that with an indefinitely 
large population, or in any case if 4an is large, this expression reduces 
to 2a. Thus a mutation conferring a selective advantage of 1 per cent. 



-I'OO -75 -so -25 0 -25 -50 -75 tOO 

VALUES OF an 

Fio. 2.—Probability of success for mutations having a very minute selective advantage 
or disadvantage. 

will have practically a 2 per cent, chance of establishing itself. The 
value of this probability affords a means of checking the accuracy of 
our solution for values of a which, while still small, are large enough 
to vitiate the condition that a-n should be small, the condition subject 
to which the differential e(]uation has been obtained. For, in an 
indefinitely large population, the exact probability of ultimate survival is 
given by 1 —U, where U satisfies the equation 

U = ee(u-i) 

and 

Writing P for 1 —U, we have 

cP- -log(l-P)«P + iP2 + ^P3+ . . . 



PROBABILITY OF SURVIVAL 




19.218 


218 R. A. Fisher, 

showing that when a is small, even though a^n may be large, the value 
2a is a good approximation to the probability of survival. 

When an is not large, the probability 2a/(ltends to the small 
but finite value l/27i, as a tends to zero, and is finite even for negative 
values of a; its value changes, however, very rapidly as we pass from 
small negative to small positive selective advantages. Fig. 2 shows the 
course of this change. It will be observed that the probability of suc¬ 
cess increases over fifty fold (c*) in passing from an=“l to an=-hl, 
that is from distribution A to distribution C of fig. 1. 


8. Contributions to the Variance. 


In previous work the calculations of the quantitative contribution of 
different classes of factors to the total variance of the species has been 
much complicated by the widespread phenomenon of dominance, and by 
our ignorance of the conditions under which factors may be expected to 
be dominant or recessive. With the extension of genetical experience it 
now seems probable that the recessive character is characteristic of 
deleterious mutations which have long persisted in regular occurrence in 
the species or group in which they are known ; and in the case of stable 
dimorphism, determined by a simple Mendelian factor, of the less favoured 
of the two phenot^^pes (genetic selection being necessarily absent or 
balanced in such cases). Consequently, it is probable that the new and 
sometimes favourable mutations on which evolutionary progress must 
rely are neither dominant nor recessive, but have heterozygotes of an 
intermediate character. Their contribution to the variance will then be 
simply proportional to pq or to sin‘^ and the total variance supplied by 
mutations having a selective advantage a, for each one occurring per 
generation, will be proportional to 


or to 



g—2an(l— COB 0) 


sin ddB 


2 1 
l_e-^® 2(174 ■ 


For negative values of an exceeding 2 this is nearly equal to l/2an, while 
for large positive values it approaches a constant value of 2, passing through 
the value unity when a = 0. Its course is shown in fig. 3. If in the 
immediate neighbourhood of neutrality beneficial and harmful mutations 
are equally frequent, the variance contributed by mutations in a given 
range of utility will increase sharply as the utility is increased past the 



19.219 

The Distribution of Gene Ratios for Rare Mutations. 219 

point of neutrality. For higher values of a there is every reason to sup¬ 
pose that the supply of mutations falls off» so that there will be a maximum 
in the contributions to the specific variance ascribable to slightly beneficial 
mutations. The frequency of harmful mutations probably increases con¬ 
siderably with the extent of the injury up to high values of — an; in spite 
of the decrease in the average contribution of each mutation to the specific 



Fio. 3.—Proportionate contribution to the s{>ecific variance for factors of varying 
selective advantage. 

variance, there may thus well be a second maximum, representing the con¬ 
tribution of definitely deleterious mutations which are constantly kept rare 
by counter-selection. This latter maximum is of no direct importance for 
evolutionary change, though the effects of Natural Selection in reducing 
persistent mutants of this class to the recessive condition seem to be of the 
greatest interest. The portion of the genetic variance to which evolutionary 
progress is to be ascribed may be a large or a small portion of the whole 
observable variance, but seems in any case to be concentrated in groups of 
factors each determining a very minute selective advantage. 



19.220 

220 The Distribution of Gene Ratios for Rare Mutations. 

9. Summary. 

The discussion of the distribution of the gene ratio of the author's paper 
of 1922 is amended by the use of a more exact form of the differential 
equation to be satisfied. It appears that the time needed to halve the 
variance by random extinction of genes in the total absence of mutations 
should be 1‘4 instead of 2*8 times the number of potential parents in each 
generation. Either value shows that the loss of variance due to this cause 
is too trifling to be appreciable in the balance of causes which maintain 
the actual genetic variability of species. 

The same correction alters the distribution appropriate for the main¬ 
tenance of variability at a fixed level by mutations in the absence of 
selection. The new solution closely resembles the form previously obtained 
and now confirmed for the practical case in which selection is present. The 
method of differential equations, however, fails to deal satisfactorily with 
these cases, owing to the failure of the integrals to converge at the termini 
representing cases in which one or other allelomorph is extremely rare. 

A method of functional equations is developed for dealing with the 
termini, and is shown to lead to the same solutions as the amended differ¬ 
ential equations in the central portion of the range for which the latter are 
valid, and further to give the terminal distribution of rare allelomorphs. 
The method recjuiros the investigation of a continuous function of argu¬ 
ment V satisfying the recurrence formula 

From the asymptotic form of this function its expansion in the neigh¬ 
bourhood of u — 0 is derived, giving the frequencies of the required 
distributions. 

Exceedingly minute values for the selective advantage or disadvantage 
make a great difference to (i) the chance of success of a mutation and (ii) 
the contribution of such mutations to the specific variance. The order 
of magnitude to be considered is the inverse of the population of the species. 
The neutral zone of selective advantage in the neighbourhood of zero is thus 
so narrow that changes in the environment, and in the genetic constitution 
of species, must cause this zone to be crossed and perhaps recrossed relatively 
rapidly in the course of evolutionary change, so that many possible gene 
substitutions may have a fluctuating history of advance and regression 
before the final balance of selective advantage is determined. 



30.198a 


20 

MOMENTS AND PRODUCT MOMENTS OF 
SAMPLING DISTRIBUTIONS 

AUTHOR’S NOTE 

To study frequency distributions with generality a number of writ¬ 
ers, such as Thiele, had been led to give attention to the symmetric 
functions of samples from such distributions. The most obvious of 
these are the moments. It was gradually realised that the S 3 anmetric 
functions of the distributions of such functions were necessarily ex¬ 
pressible in terms of corresponding fimctions of the parent distribu¬ 
tion. That is that a symmetric function of degree a of a symmetric 
function of degree r of a sample must be expressible as a symmetric 
function of degree rs. The pioneers in the development of the rele¬ 
vant formulae were Sheppard and ‘‘Student,’’ but by about 1919 
an immense amount of algebraic material of this sort had been pub¬ 
lished by Tchrouproff, using what app>eared to the author a very 
clumsy approach. C. C. Craig had indeed called attention to the 
need, if the algebraic formulation was to be made manageable, of 
the use of functions other than crude moments. 

In this paper are defined the functions which provide the neces¬ 
sary simplification, namely the symmetric functions A;, the mean 
values of which are unconditionally equal to the cumulants of the 
parent distribution. On the basis of the simpler forms so obtained 
for some of the expressions already known, and others which are 
made easily accessible, the paper develops an approach in which the 
mechanical simplification of overwhelming algebraic formulae is.re¬ 
placed by a consideration of the properties of certain bipartitional 
fimctions, which, apart from the sample number n, are purely arith¬ 
metical. 

This form of approach has the advantage that it is immediately 
applicable to the complex extension offered by bivariate and multi¬ 
variate distributions, for we have merely to consider under the same 
rules of procedure the bipartitions of multipartite numbers to obtain 
for them equivalent formulae. Section 11 on measures of departure 
from normality may be ignored, as it has been superseded by a more 
exact treatment (Paper 21). Complete univariate formulae are given 
up to the 10th degree; th^se afford a commodious check for develop¬ 
ing any multivariate formulae that may be required. 

* Reprinted from Proceedings of the London MathemaUad Societyt Series g, 
Vol. 30, Pt. 3, pp. 199-238, 1928. 



20.199 


MOMENTS AND PKODUCT MOMENTS OF SAMPLING 
DISTRIBUTIONS 

By R. A. Fisher. 


1. Introductory. 

If a random sample of n observations be taken from a univariate dis¬ 
tribution, and the sample values obtained be designated by oci, Xa, 
then any symmetric function of these sample values of degree r may 
be termed a moment function of the sample of the r-th degree. If the 
coefficients of the symmetric function involve the sample number n 
in such a way that, as n tends to infinity, the value of the function 
tends to a finite limit, in the sense that the probability of exceeding 
or falling short of that limit by a positive quantity e, however small, 
tends to zero, then the limit to which it tends is a moment function 
of the population sampled, and the moment function of the sample 
may be regarded as a statistical estimate of the corresponding moment 
function of the population. 

If we consider the random sampling distribution of such a statistic 
it is evident that the moment functions of this distribution will be ex¬ 
pressible in terms of the moment functions of the original distribution, 
in so far as these are finite, by means of formulae which will be inde¬ 
pendent of the nature of this distribution. For example, a moment 
function of degree s of the sampling distribution of a moment function 
of degree r will involve only symmetric functions of the observations 
of degree rs, and will therefore be expressible as a moment function of 
the population of this degree, irrespective of the moments of higher 
degree. 



ao.2oo 

200 


li. A. Fisher 


Numerous researches have been made into the moments, chiefly of 
the second order, of moment statistics. The algebraic method was 
developed by Sheppard [1], and used extensively by Pearson [2, 8] and 
Isserlis [4, 5] ; in all these researches, however, owing to the supposi¬ 
tion that the mean of the sample coincides with the mean of the popula¬ 
tion, or for other similar reasons, the results are only first approxima¬ 
tions neglecting In 1913 [6] Soper obtained a number of approxima* 

tions as far as n“*. In 1908 “Student’* [7] derived an exact formula 
for the second moment of the variance as estimated, which corresponds 
in a different notation to equation (1) of this paper for the univariate 
case. Later, much work, by the exact algebraic method, was carried 
out by Tchoiiproff [8], who obtained in this way the first eight moments 
of the mean, in addition to the univariate formulae corresponding to 
numbers (5) and (14). Tchouproff’s version of (14) in the univariate pro¬ 
blem was subsequently corrected by Church [9], The application of 
the combinatorial method developed below to the general moments of 
the distribution of statistics of the second degree from normal multi¬ 
variate populations has already appeared in a paper by J. Wishart 
[ 11 ], 

Apart from the last, these results are subject to two somewhat 
serious limitations; the great complexity of the results attained detracts 
largely from the possibility either of a theoretical comprehension of 
their meaning, or of numerical applications; it has also led to great 
difficulties in the detection of errors, which have had on more than one 
occasion to be corrected by subsequent workers. Secondly, partly no 
doubt in consequence of this complexity, attention has been almost 
solely confined to the direct moments of single statistics, and the product 
moments, specifying the simultaneous distribution of two or more 
statistics, have been largely neglected. The total number of formulae 
of degree no higher than 12 is large, and it is scarcely possible that the 
whole body should be made available, either for study or for use, unless 
an improved notation can be found which will greatly simplify the 
algebraic expressions. It will be shown that the fcwmulae are much 
simplified by the use of the cumulative moment functions, or semi- 
invariants, in place of the crude moments. 

The importance of the formulae lies in their generality; they are 
applicable to all distributions for which the expressions have a mean¬ 
ing. In the present state of our knowledge any information, however 
incomplete, as to sampling distributions is likely to be of frequent use, 
irrespective of the fact that moment functions only provide statistical 
estimates of high efficiency for a special type of distribution [10]. 



Sampling distributions. 


20.201 

201 


2. The cumulative moment functions. 

If the probability that a single sample value falls in the range dx is 

<f>(x) dxt 

then the function 

M = J dx, 

taken over all possible values of the variate x, may, or may not, have a 
meaning for real values of t. If it has a meaning we may expand the 
exponential term, and, writing 

yUr = j a;’' ^(a;) dx, 

we have M= ^ + - • • 

ff vve expand the logarithm of M in powers of t we may write 

K = log M =: I”! •» 

where the cumulative inornent functions k are determinate functions of 
the moments /x, whether the series converges or not; moreover, since Kr 
involves only /w,., and lower orders, it follows that, if ui* •••* Mr ar© finite, 
so will Ki, Kr he finite. 

The expression of Kr in terms of /x will involve the term 

corresponding to any partition 

of the integer r, with coefficient 


(_)P-i(p-.l)t _ r! 

TTi ! TTa ! ... TT j ( p, \r^ (i?a ir’ 

where p = S(7r) is the number of parts. 

Similarly, the expression pr in terms of k will involve the terra 



a0.202 

202 


B. A. Fisher 


with coefficient 


1 / ! 
TTjTTa! 


The simplification of moment formulae obtained by referring the 
moments to the mean of the distribution is due to the fact that, when 
^1 = 0, no subsequent yn involves #ci, and the number of partitions re¬ 
quired is much reduced; thus 


Mfl = ^9^ Ms = 

and so on. The advantage of this simplification may be carried to higher 
orders by consistently using the cumulative moment functions k in place 
of the moments fx. 

The cumulative moment functions supply an immediate solution of 
the problem of the distribution of the mean, for, using the well known 
cumulative property, that, if x and y are independent variates, 

K{x-Jfy) = K{x)^K{y\ 

where K{x) stands for the K function specifying the distribution of sc, 
we find that, if Si = S(x) is the sum of 7i independent values constitut¬ 
ing a sample from a given distribution, then 

K{si) = nK{x) 

== me, f+ mc3 ^ +7i/rj ^ • 5 

but the mean is 5 = {\/n)si ; consequently the K function of the mean 
is found by substituting t/n for t in the series for K{si), giving 

X(5) = + It'*' 

The value of Kr hi the distribution of the mean is thus found from 
that of the sampled distribution by dividing by n*'"'. 


3. The appropriate moment statistics. 

In order to take the full advantage of the properties of the cumulative 
moment functions, it is necessary to introduce a modification also into 
the form of the moment statistics; it is usual to employ statistics which 



20.203 

Sampling distributions. 203 

may be written = — S(x—W, 

71 

which are called the moments of the sample about its mean, together 
with the mean itself, x. These moments may be expressed in terms of 
the symmetric functions defined by 

Sr = SUO, 


by direct expansion ; for example, 

X = 

and so on. While the coefficients n~'^, etc., are kept simple, we here 
encounter the complication that the mean value of is not in finite 
samples equal to ; in order that this should be so wc should multiply 
Wa by n/(n—1), and m 3 by n*/[(n—l)(n—2)] ; further, for functions of the 
fourth and higher degrees, is not a linear function of the moments /jl, 
and, in consequence, a moment shitistic of which the mean is Kr will not 
be exactly the same function of moment statistics, of which the means 
are ftr, as Kr is of /Xr. a preliminary step, thei'efore, to the simplifica¬ 
tion of the formulae to be obtained, it will be desirable to obtain, in 
terms of the direct summation values the moment statistics of each 

degree of which the sampling moans shall be /cj, k 2 , kq^ - They will 

be repre.sented by kj, ka, kg, .... 

The first few statistics which fulfil this condition are 


kj = tfii = n 


'‘■a = 


A-.= 


((n+1) —3(»—1) 


(»i—l)(n—2)(n —3) 

4'12n”* sj S.2 — 6n“‘5{[, 




20.204 

204 


R. A. Fisher 




(»—l)(n —2)(«—3)(n—4) 

X |(»+6)Sj— 6 10 ^ «aSg 


*.= 


(n—1)... {n — 6) 


I ^^ + 2 n—1 o 60 3 , 24 5 ) 

+ 20 —^ SJS3+SO —^ SiS^—^ sjs^+—a s5|, 

U/i +1) (n^+16 n—4) mg — 16 (n—1 )= (71 + 4) Wg W 4 

— 10(71—l)(7t“—n+4)mJ+30/i(w — l)(»—2)77?|} 
71+1 


|(n+l)( 7 i"+ 167 i- 4 ). 9 c -6 (71^+15/1-4)5155 

. (w-D® 


— 16 • 


(71+4)5354 —10 —^1 + 4 ) 5 ^ 


^ 30 -!___ ! +120 — — 5, 52 53 


H.30 *»-120 ^ 5?s, 


— 270- 




If these be employed we have not only the result that the 7*-th cumu¬ 
lative moment function of the mean is 7i"^'“’^Kr, but also that the mean of A;,, 
is Krt thus reducing a second group of the required formulae to its simplest 
form. It is, however, the effect of their use upon the more complex 
formulae w^hich is of the greater importance. The general structure of k 
for any degree will be elucidated in § 10. 


4. The aggregate of moment sampling formulae. 

If we consider in its full generality the simultaneous distribution in 
random samples of the statistics /ci, fca, fes, ...» it is clear that we can 
represent it by means of cumulative moment functions analogous to those 



Sampling distributions. 


20.20S 

206 


developed for a single variate. To any partition 


of the number r, there will correBpond a moment 

...pW = mean value of /cj; A:'*... 

and, if we write 




^ jS. 

, 1 




, 1 ’ 


TTj ! TTa ! TTa ! 

the expansion in terms of ... of K = log il/ assumes the form 






'"■2 


There will thus he a separate formula of degree r for every partition 
of the number r, and for the complete specification of the distribution 
each must be expanded in terms of the cumulative moment functions of 
the sampled population. For example, the semi-invariants of the dis¬ 
tribution of the second moment statistic will be given by the terms 
corresponding to the partitions (2), (2®), (2®), (2^), ..., which we 
designate by 

/c(2), a:(2’-‘), /c(2®), /c( 2"), and so on. 


The well known solution of the distribution of the mean, given above, 
may now be written 


Kill = 


(I) 


wliile from the manner in which the statistics k have been constructed 
we have also 

Kir) = /c,. (II) 

In general, the expression for the k corresponding to any given 
partition of r will include a term in Kr together with terms of the form 

where ^f' partition of r in which no part is unity. 

This restriction, which greatly diminishes the number of terms to be 
evaluated, flows from the consideration that ki, unlike all other cumula¬ 
tive moment functions, is altered by a change of origin, and by such a 
change can be given any desired value, while of the moment statistics 



90.206 

206 


K. A. Fis^r 


also hx is the only one affected by such a change, and that by addition 
of a quantity which is invariable from sample to sample; consequently. 
Ki can only appear in the single formula 


k ( 1 ) = Ki, 


expressing that the mean of the sample of n will be the mean of the 
population. 


5. Partitions involving unit parts. 

A relationship exists, of which a proof may be deduced from the 
general theory to be developed, which enables us to dispense with the 
separate examination and tabulation of the formulae corresponding to 
all those partitions which involve unit parts. The effect upon ttie corre¬ 
sponding formula of adding a new unit part to the partition is (1) to 
modify every term in the formula by increasing the suffix of one of its 
K functions by unity in every possible way, and (2) to divide the whole 
by n. For example, the formula for the variance of is 


whence we may deduce, by applying the above rules, 

and, by further applications. 




4 

n®(?i—1) 

4 

n^{n — 1 ) 


. 4 

■ 12 
n^n-l) 




and so on. 

An immediate consequence of the same relationship is that 




(III) 


The number of formulae remaining of any degree r is the number 



Sampling distributions. 


20.207 

207 


of partitions of r into parts of 2 or more; these are 

r 4 6 6 7 8 9 10 11 12 13 14 16 16 17 

partitions 1 1 3 3 6 7 11 IS 20 23 33 40 54 65 

Up to the 12th degree there are therefore 65 formulae, while 150 more 
will only reach the 16th degree. It is proposed to put on record, as a 
basis for discussion, the formulae up to the 10th degree, together with a 
few others of special interest, with an explanation of the procedure of 
calculation. 


6. Calculation of formulae. 

In the calculation of the formulae by the algebraic method it is de¬ 
sirable to proceed somewhat formally, although the results for the 4th 
and 5th degrees may be obtained fairly readily by writing dow'n the 
algebraical expressions at length. The procedure may be illustrated by 
the work for the formulae of the eighth degree. There will be six of 
these, and corresponding to any of these, such as ^(62). the k product, 
fcefca, may be written down and expanded in the symmetric functions s. 
The work proceeds in three .steps : (1) the mean value of the k product 
is expressed in terms of the population moments /x; (2) by substitution, 
the expression in terms of /x is condensed into its equivalent in terms 
of k; (3) from the moment thus obtained, corresponding to the required 
partition, the corresponding cumulative moment function is found by 
the use of formulae of lower degree previously prepared. 

The first step is carried out by means of easily verified relationships 
giving the mean value of such a product as Sj,s,jSr in the form 

In order to apply these relationships expeditiously a table is prepared 
for each degree, showing the coefficients with which each /u product, 
ignoring //i, occurs in the expansion of each product. 

To evaluate the mean value of any k product, such as /cjA,, it is 
first expanded in s products as 


in 


/ 2 1 2 6 2 I 8 I 8 ^ r> 

— !)“(«—2)''‘ n n *»*“*>+ s»*a-‘'i+ „!> 


_2i . le, 

*2*1+ „4 *2*1 


whence from a table of the separations of 8 the following table may at 
once be constructed. 



20.208 

208 


B. A. Fishbb 


TABIiE 1. 

Caloulation of the mean value of k\k^. 





n(n-l) 

n(n-l) 

n(n—1) n(n- 

-l)(n-2) 

n(n—l)(n —2) 

n(n —l)(n—2)(n 



«A‘a 

/*«Ma 

Mi Ms 


M4m5 

mX 

mJ 



1 

1 

2 

— 

— 

1 

— 


n~* < 


-1 

-2 

-2 

— 

-1 

- 1 



1-6 

-12 

-18 

-6 

-6 

-12 

- ] 

•?*? 

n-» 


40 

36 

60 

54 

80 

27 

80 

64 

40 

64 

— ) 

9 / 

*3*1 

«> ~s 


-40 

-44 

-20 

-60 

-40 

- \ 

*?*,♦ 

n ■ 

1-21 

-168 

-262 

-147 

-386 

-420 

-63 < 

*3*1 

n* 

( 10 

256 

416 

240 

960 

1120 

240 ) 

1 

n * 

(-4 

-112 

-224 

-140 

-840 

-1120 

-420) 


Collecting like terms and cancelling the factors w—1 and n —2 when 
ever possible, we get 


m(8>2) = ^ + 


n®—8«4-28 

»®(»— 1) 


. 2m*— 12«*4-48n —66 


M6/*8 


, —8n»+26n—36 , , 1 

«*(n —1)* l)»(n —2) 

X U—+84n*—396»*+960n—840) Mi 


+(n*—13M‘+94rt“ —460»*+1120n+1120)MiMj 


+ (9»‘—90n*+429n*—1140n +1260) mI r • 


The second step consists in substituting 

i“4 = 


Me = ice- 4 -lo<f 4 ^a“MOic|+ 16 #rJ, 


Ms = /cg-l-28^5/C2H-66/C5 <8+86*14-210#f4<c2-i-280/cJ<c2+106/fJ, 

which reduces the expression to the simpler form 


_ <^8 I W + 20 , -r-±-±ft~ — , ;arn — o 

M(S 2) - ' —(.n-lf n(n-i)^ 

. 9n*+8ln—180 . n®+17n-*+104n —320 

"T ^-'^4'f2"r 


2n^+44n—64 


27n —46 


(n—l)‘‘*(n—2) 


( 71 - 1 )^ 71 - 2 ) 


67i'* + 307t 
■^(n—!)*(»»—2) 



Samflino distributions. 


20.209 

209 


The third stage consists in removing from /i(3*2) those terms which 
do not belong to k( 3*2) ; from the general relationship which connects 
these two groups of functions 

m( 3®2) = /c(3“2)-f2/c3ic(32) + v.2Ac(32)4-/f;/c2. 

and from formulae of lower degree already evaluated we know that 

^(32) = 

n n — 1 


while 


,c(3^) = ^ + 

n n — 1 


9ic*f 6/1 

n—1 ^ (n—l){n — 2y 


IveniovJng the superfluous terms we are left with 


ic(3'"2) 


__ jjfH _‘21_— Lll 

n{n — 


^-> ■+* 


9(3/t—5) 2 
//(»—])“* ** 


, 18((iH —11) , 18(9/1 — 20) , 


8fm 4 

(n—I)'!//— 2 ) '»• 


p.n expression in whicli the part played by each of the characteristic co¬ 
efficients of the original distribution is clearly apparent. In the normal 
distribution, for example, when every coeflicient beyond k 2 vanishes, 
only the last term remains to be evaluated. 


7. The univariate formulae. 

In addition to the partitions involving unit parts, which have already 
been set aside, the numbers 4 and 5 have only one partition each, 6 
and 7 have three partitions each, while 8, 9, and 10 bring the total 
up to 32. These are given in the followijig Table. Since it is scarcely 
to be hoped that all of these, especially the heavier formulae, will be 
entirely free from error, it should be particularly noted that any suspected 
term may be evaluated separately and independently by means of the 
combinatorial method elaborated below'. I am indebted to Dr. J. Wishart 
and Prof. Hotelling for checking these formulae. 

In addition to these formulae, which are complete up to the tenth 
degree, four others of the twelfth degree may be put on record, namely 
those for the variance of ks, the third moment of /c 4 , fourth moment of ks. 


















20.212 


212 R. A. Fisher 

TABLE OP FORMULAE-con<intt#d. 


■■ 

*1(1 

*8*S 

KjK, 

*«*4 


»»*«*2 

^*5*8*1 


n 

n-1 

n-1 

n-1 

n-1 

(n-l)(n-2) 

(n-l)(n-2) 

Rml 

1 

16 

56 

112 

70 

— 

— 

RH| 

1 

21 

84 

168 

105 

126 

680 

MS] 

1 

24 

96 

194 

120 

180 



1 

25 


200 

125 

200 

ISOO 


1]0 

KnXf 

*7*3 

*8*1 

*s 

*«*5 

*8*3*8 


n“ 

n(n-D 

n(n-l)* 

n(n~iy^ 

n(n-l)‘^ 

(n-l)J(n-2) 

(n-l)*(n-2) 


1 

28 

12(7n-9) 

4 (41 u-56) 

20(5n-7) 

168 (n-2) 

840{n-2) 

ic(632) 

1 

31 

lOln-131 

5(87n-65) 

6(23n-36) 

30(9n-16) 

30(45n-92) 

K (4»2) 

1 

32 

8(13ji-17) 

4(49n-73) 

4(29n-46) 

8(37n-65) 

1536 (n-2) 

m 

1 

S3 

6(19n-25) 

3(65n-107) 

. 

6{19n-34) 

18(19n-33) 

72(28n-52) 

nil 

*in 

***i 

* 7*3 

(t«*. 

*2 
_ y 

*8*2 

*5*3*8 



n*(n-l) 

n*(n-lV 



n(n- l)*(n-2i 

n(n-l)»(n-2) 


1 

36 

4(23n-87) 

4(47»i«-120n + 8l) 

12(9n*-24n+17) 

360(n-2) 

288(6n-7){n-2) 


1 

37 

6{17n-27) 

3(61n2-166« + 117) 

2{59n*-154n+113) 

C(67n-181) 

24(71n*-246n + 202) 


*10 

* 8*3 

*7*:t 

_*«*4 _ 

k'^ 

*s 

_*6_'^2_ 

*5*3*8 



n>(n-l) 

n!‘(n-l)* 

n*(n-l)* 

n*(n-l)'* 

n-(n-l)* 

n-(n-l)* 

EhI 

1 

40 

80 (n-2) 

40(5»*-12n + 9) 

16(n-2)(6n*-12u+ 7) 

480 

1280 (n-2) 


and the sixth moment of /cj. These are :— 

*( 6 “) = •if„+^( 86 <,„«j+ 180 /r,«r 3 + 465 ,r,f,+ 780 it,*,+ 461 /ci) 


+ 


+ /- :rr, - 3 ; (ISO^fg kI + 36 OO/C 7 +7200/^6 K^ /cj+6300/f^ /cj 

+4 500/f j «'2 4" 21 OOO^fj 4* 4950/f*) 

(„_,KriII„_ 3 ) (2400..4+21(i00.,,.,.» 

+16SOO-tV8+54000/c,/r»/fj+8100/f;) 

+( 54 oo.,.H 2 ieoo.J.') 

720/r“ (50) 


+ 


(«-l)(«- 2 )(«- 3 )(«- 4 ) 

)i()t+l)(»Hl 5 «- 4 ) 

(n — 1 ) (h— 2 ) («— 3 ) (»— 4 )(k— 5 ) 


*( 4 ’)=T<tij+ , ,, 

n ' H(n— 1 ) 


48 lfi( 13 H- 17 ) , 12 ( 41 m- 66 ) 


, 48 ( 16 n- 29 ) , 12 f 37 «- 70 ) , , 72 (l]»- 19 ) , , 











ao.213 


Sampling distributions 


218 



nic^K* 

n(n + l)ic^it* 

n(n4 l)/t|*i{ 

n>(n + 5)«5 


(n-l)(n-2) 

(n-l)(n-2) 

(n-l)(n-2)(»-3) 

(n-l)(n-2)(n-3) 

(n-l)(n-2)(n-3)(n-4) 


— 

— 

— 

— 


(22) 

420 

080 

— 

— 

— 

(28) 

720 

1260 

480 

1080 

_ 

(24) 

850 

1200 

600 

1800 

120 

(26) 

*4*a 

K^Kl 

* 4 * 2 * 

kIk] 



(n-l)-'(n-2) 

(n-l)*(n-2) 

(n-l)*(n-2) 

(n-l)*(n-2) 



660(n-2) 

840 (n-2) 

— 

— 


(26) 

60(16n-81) 

30(46n-103) 

720n 

1620» 

_ 

(27) 

144(7n-16) 

72{21n-50) 

96(10»’-27n-l) 

144{17n*-68n-2) 

192 (n 41) 




n—3 

n-3 

n-S 

(28) 

64a9n-48) 

64(33n»-148n + 172) 

72n(17n-40) 

108n(27n-70) 

216n 



n-2 

n-2 

'n-2 

n-2 

(29) 

»'K, 

K^K^ 

K^K| 

If*'* If® 

flK^ 


n(n-l)»(n-2j 

»(n~l)»(n-2) 

(n-l)»(n-2) 

(»-l)*(n-2) 

(n-l)»(n-2) 


144(7n-10)(n~2) 

24(49n-.96)(jt-2) 

960(n-2) 

2 l 60 (n- 2 ) 

_ 

(80) 

86(29n*-103n + 93) 

36(88n*-155n+160) 

72(14n-23) 

144(l9n-44) 

288 

(81) 

«4Kj 


* 1*2 

kIk] 




n*(n-l)* 

nln-l)'* 

n(n-l)< 

(n-D* 


820(4n*-9n + 6) 

480(2n*-77t + 6) 

1920 

1920(n-2) 

384 

(82) 


, 288(19w~41) . 48(203/1-523) 


144(56h’-257m+802) , . 1440(4«-11) , 


, 1162(22»*-106n+188) , 8(709n®-8480n+44r)6) 3 


288(19n"-98/t’'+125n+2) , . 1728{24n'’-iW+2oon+4) , 

(n-l)‘(»i-2)’(«-3) («- 2 )'‘(«- 2 f(rt- 8 ) 


482(49tt*-287»i''+408n+12) , , . 864(103«''-629»tH984»+24) , 

+ {n-lfin-ifin-Sf (n-lf(K-2)‘‘l)t-3) 


, 288(41?t‘~384M’+1209»t’'-1282n-S6) , 288(89»'‘-823n-88)n , 

(rt-l)'(»- 2 )*(»- 8 )' (n-l)’(n- 2 )’‘(n- 8 ) 


. 1728(29«“-196)tH317«+62)» „ . , 1728(»'‘-5«+2)(n+l))t , 
+ (n-l)’(n-2)'‘(»-3)* (»-l)“(«-2)“(«-8)* 






ao.214 


B. A. Fishbr 


108(2n~8) 


17n*—49n4-35 


^ ““ n» 1) n“(n—1)» „a(„_i)8 ^ 8^:4 

. 7n*—20n4-16 , 17n*—47n-f 89 , , 37?i--70 

+ ^°® n»(«-l)» n>*(»-l)* **+27 «s*a 

, 19»»-67n+S4 . 66»»-245w+234 

n(»—l)“(n—2) n(« —l)»{>t—2) 

, 82n“—481n’'+968n-640 . . 69»“—220n+224 , 

+ 108- „(n^i 7r (n -- :r2^ -*.<i+108 

. or.. 76«*—478n*+1016fi —766 

-„(„-l)*(„-r2)^- 

. „„ 173»i*—1603TO®+4962n*—7380^+4200 , 

+ -n(n-l7(~-2)»- 

, 71n‘‘—263n+284 , , 79«»-348w+378 

+ 108 (^_i)8(„_2)=‘ *0*2+848 („_i)a(„_2)2 *o*»*2 


+ 486 *0*2 + 972- 


(»»—l)®0i—2)“ 


+ 162 87»'‘-594n»+1420n-n76 , 29»»-121»»+118 , 

+ 1®^ (n— l)»(»i-2)» *8+972 *4** 

. 103«’' —610M+640 5n—12 » 

+648>t (62) 

- A -4- . 00 * , I 160(n-2) 2n»-6»+4 

(C(2 ) _ .ri,+ *10*9+ „•(„_!)» *9*8 + 240 *8*4 


+96(n—2) 


7w*—14w+9 
n‘(n—1)* 


*7*6+4 


113»‘—620n»+960»“—80071+266 

7l‘(7l —D® 


, 1200 8, n—2 , 671®—12»+9 

~^ n*( 7 i-iy® *8*9+4800 *7*8*9+ 2400 * 6 * 4*9 

. rtv 31n—53 2 1 12n+7 « 

+ 160(71—2) „8(„_1)4 K«<Cs+960(7t 2) „8(^_1)6 *6*9 

. 97i®-287i+16 , ,„„ll7i»-4l7i®+697i-31 . 

+ 1920(71-2) Vi^n-D* *6*4*8+480-,i«(„-i)8-*’ 


SFUV/Vr a I 

• 71^(71- 1?'^®*^® + 


38400(n~2) 
7 i*(n— 1 )^ 


<6*^8^1+9600 


4n® — 97 i-h 6 


8 —7n4'6 2 , 671—12 4 , 28800 4 

+28800 „9(„_i)6 *4*8*9+960(71—2) „9(„_ j^e *8+ „(„_i)4 * 4*9 

~i-B8400 ^ ^ ic®ic*4- it* 


(65) 



Sampling distributions. 


20.215 

215 


Some idea of the advantage of using the cumulative moment func¬ 
tions in place of the moments will be obtained by comparing the above 
formula (14) with the corresponding formula as obtained by Tchouproff, 
and corrected by Church : 

8 1 

+ 48/1, - SOm?) 

— (4/xh— 40/Z(j //2—98^5/^3~ 54Mf 4* 336 /X 4 nl + 5*28/u 5 — SOd^tJ) 

- (6/*e-96/x,ju,- 176 /x5/*s- 102mH924^4«!+1232^1^3-1044,4^ 

-^(4Mg-88M,y>43-160,43M3-95A45+1050M4/»|+1360yu|M3-1395M5) 

+ (M8-28,4,M3-66,4.,43-36Mi+420M4M|+5fi0,4|M3-630,4;). 

The term involving only in if(20 is already known as the r-th semi¬ 
invariant of the distribution of the variance for samples from the normal 
curve, and is simply 2’^“'.(r—l)!/(w—1)’”\ The corresponding term 
in k(S*) is of interest as showing that the distribution of ^*3 in samples 
from a normal distribution, though necessarily symmetrical, yet tends 
somewhat slowly to normality. Comparing the term of (€*►) with that 
of (4) it is evident that 

k{3*) _ 18(6w—12) 

iKCS’)}* (»-l)(«-2)’ 

or is somewhat greater than 90/n, a fact which indicates that the occur¬ 
rence of values of ks greater than 2 or 3 times its standard error will, 
except in very large samples, be materially more frequent than one would 
judge from an assumed normal distribution. The effect upon tests of 
normality will be examined in § § 11 and 12. 

8 . Bivariate and multivariate distributions. 

The extension to bivariate and multivariate data of the methods of 
classification and calculation developed above is of both practical and 
theoretical importance. Apart from the variance the product moment 
of a bivariate distribution is the most important of all moment statistics. 

Moreover, the multivariate formulae, by reason of their greater 
number, and the confusion caused by the various possible notations in 



a0.216 


B. A. Fishbb 


which they may be expressed, are particularly in need of orderly classifica¬ 
tion. It will be found, in addition, that the examination of the multi¬ 
variate formulae in their generality throws much light on the expres¬ 
sions already obtained. 

It will be seen that just as the univariate formulae correspond to all 
the possible partitions of unipartite numbers, so the multivariate 
formulae correspond to all the possible partitions of multipartite numbers, 
having multiplicities equal to the number of variates. 

To make the notation clear let us consider, in the first place, two 
variates only, and let the frequency with which the two variates x and y 
fall simultaneously in the ranges dx and dy be 

df = <pdxdy, 

in which is the simultaneous frequency function of x and y. 

The general moment about any origin is defined as 


Mm = tpdxdy 

over the whole range of possible values of the variates. So far as these 
moments have a meaning we can build up the expression 


2 2 
|i=zO v=0 


^ ii 

>^p\qV 


and equally, with the same limitation, the coefficienta of the expression 


= log M = i: 2 

2 > = 0 q—Q 


11 A 

pi qt 


will be well defined. The general expressions connecting the cumula¬ 
tive moment functions, k, with the moments, m» of the simultaneous dis¬ 
tribution are analogous to those given for univariate distribution; if 

((i^i 

is any partition of the bipartite number r, s consisting of p parts, 


= s| 

and 




TTj ! -TTg I 


Mr.. = S 


(Pi (P2 Jr*... 1)"* ( P2 !r*... * 


1_1_iii_ il-, _I 

iTTilTTa!... (l?i!r*(i?aJr*... (piD^'(pir^ ... 


) 

)’ 


the summation being taken over all possible partitions. For any sample, 
we may define Spf as the sum of the values of x^y^ for each pair of values 



Sampling distributions. 


20.217 

217 


in the sampie, and obtain, as for single variates, the statistics 
^11, feai, Jcsi, ^22, etc., as expressions in terms of these sums, with such 
coefficients that the mean value of kj^ shall be Kp,i. Thus we have 

•*^11 = —-f (sn —— *10%). 

~ — 2 ) V*“‘ »*>"**' „ %%+*io%J. 


(w—l)(w —‘i)(n — 3 ) 




8 (wH-l) ,6 o , b 


^ . A, 

*01 *ioj » 


A:„o — 


_ n _( 

(n——‘2)(7( —8) ( 


(n-f 1).S22—2 




n 

n—1 
n 


Zn 

71 — 1 » , B 

•*20 '*^02 ^ Z -^ 11 + ,, •‘^11 •''Ol '^10 


,2 2.2 b 2 a I 


Tlie mean value of any product involving such statistics, as, for 
example, k^oku, may he evaluated in terms of the cumulative moment 
functions of the bivariate distribution; such mean values may be written 

^*20 (q » 

giving one line to each variate; its value is easily found to be 

1 1 1 

-,7 '^»> + 7:=i 


Hence, subtracting the product of the mean values, kjoxii, we have the 
formula 

/2 1 \ 1 , 2 , 

in which each column represents the particular statistic entering into the 
product, and the marginal column found by summing each row is the 
multipartite number ( 31 ) representing the degree in which each variate 
is involved. Similarly, we may deduce the two formulae for partitions 



20.218 

218 


B. A. Fisher 


of the bipartite number (22), namely 


and 




iCaaH- 


71 —1 




U—1 


^oa *^*201 


(1 b) 

(lc> 


representing the product moment of the estimates of variance of the 
two correlated variates, and the variance of the estimated product 
moment. 

It will be observed that by equating the two variates, which is carried 
out by summing the columns of the partition, and replacing the two 
suffixes of each k by their sum, equations (la), (lb), and (Ic) are reduced 
to equation (1). As with univariate formulae, the partitions involving 
parts of the first degree may be directly derived from formulae of lower 
degree and therefore need receive no separate consideration. 

With more than two variates the bivariate notation may be extended 
to the use of three or more rows in the representation of a partition of 
a tripartite number, and three or more suffixes to the parameters k. 
The remaining formulae of the fourth degree are therefore 



__ J_ 

. 2 

(1 d) 

n 



.1 .1 

(le) 

n 

^311 J 1 ^^101 

__ JL 

71 

*^10100101 “b ^_2. *^1001'folio* 

(1*) 


representing the partitions of the tripartite number (211) and of the 
quadrupartite (1111), ignoring such as have unitary parts. 

Just as equation (1) may be derived from either of equations (la), 
(lb), or (1 c) by identifying the variates, so, by equating appropriate 
variEtes, (la) may be derived from (Id), or (lb) from (Id), or (Ic) from 
(le), and finally all can be derived from the general multivariate 
formula (1*). 

It appears, therefore, that the formulae appropriate for both univariate 
and multivariate distributions may all be expressed in teims of those 
representing partitions of the multipartite number (1*). Thus of the 



Sampling distributions . 


20.219 

219 


sixth degree, a series of formulae, of which formula (8) is the final con¬ 
densation, will be given by the partition of the multipartite (1®) into 
parts (1^0*) and (0*1®), a series of formulae reducing to (4) by the parti¬ 
tion into the parts (1*0®) and (0*1®), and a series of formulae reducing 
to (5) by the partition into parts (1*0*), (0*1*0®) and (0*1*). The pre¬ 
sentation of formulae of the type here discussed for the case of many 
variates might therefore be completed by the tabulation of the general 
multivariate formulae (2*), (3*), etc. 

The disadvantage of such a course is that such general formulae will 
consist of a large number of terms equal to the sum of the coefficients 
(of the highest powers of n) of the formulae already tabulated, and that 
oach term will consist of a product of ic’s, each having as many suffixes 
as the degree of the equation. The general formulae are therefore ex¬ 
tremely cumbrous, and, as the suffixes will consist merely of repetitions in 
different orders of the numbers 0 and 1, it will be of more value if general 
rules can be found by which these particular combinations are to be 
selected. Such rules will then apply to the univariate and less general 
multivariate cases, the coefficients being merely the number of ways in 
which each selection can be made. 

Now the suffixes of the product terms are merely other partitions of 
the same number, whether unipartite or multipartite, of which one par¬ 
ticular partition specifies our formula; we are therefore concerned with 
the difficult question of the relations which can exist between different 
partitions of the same number. This question may be considered solely 
with respect to unipartite numbers, for if the rules can be made out 
which govern the coefficients in such cases, the same identical rules must 
apply to multipartite numbers by reason of the methods by which one 
formula may be condensed into another. For example, if we start with 
the partition (2*) of the number 4, in conjunction with the rule that only 
such partitions are to be considered as in each part involve elements 
from both parts of the old partition, we should obtain equally the co¬ 
efficient 2 of the term k^, and by applying the same rule to the partition 
of the multipartite number (1*) into parts (1*0*) and (0*1*) should obtain 
the terms icioio, and kiooi, koho, having in both cases the same divisor 

n—1. 


9. Empirical statement of the rules for the direct evaluation of the 

coefficients. 

Although the rules of the combinatorial procedure were not com¬ 
pleted before the development of the method of Section 10, yet so much 



20.220 

220 


B. A. Fisher 


can be learned by an empirical study of the formulae that it is con> 
venient to make a complete statement of the rules in an empirical form, 
prior to the demonstration of their Talidity. 

(1) The coefficient of kJJ/cJj... in the expression for depends 

on the possible partitions of the second order of which the column totals 
give the partition totals give the partition 

For example, the coefficient of k^kI in the expression for k( 4*2) may 
be obtained by inspection of the partitions of the second order 


2 2 2 

6 

2 3 1 

6 

8 3 . 

6 

1 1 . 

2 

1 1 . 

2 

1 . 1 

2 

1 1 . 

2 

1 . 1 

2 

. 1 1 

2 

4 4 2 

10 

4 4 2 

Ilo 

4 4 2 

10 


in each of which the sums of the rows constitute the partition (62’), while 
the sums of the columns constitute the partition (4’2). 

(2) The numerical factor in the contribution made by any partition 
of the second order is the number of ways in which the totals in the 
lower margin may be allocated to form a partition of the type considered. 
The numerical factors corresponding to the three partitions set out above 
are 72, 192, and 32 respectively. In the first case, for example, the 
number may be arrived at from the consideration that the pair of units 
to be separated in the first four may be chosen in six ways, and that 
these may be assigned partners from the second pair in twelve ways. In 
the second case we may choose either of the two fours to be parted into 
(21’), as in the first column, and, whichever is chosen, we may allocate 
the units in the three columns in twelve, four, and two ways respectively ; 
while, in the third case, we may choose the units from the two fours in 
sixteen ways and associate them in two ways with the units of the two. 

(3) Before considering the general rule for determining the function 
of n by which the numerical factor is to be multiplied, it is convenient 
to note that certain partitions of the second order make no contribution 
whatever to the coefficient, and so may be neglected at once. The most 
useful class consists of those in w’hich any row has only one entry other 
than zero; for example, such partitions as 


2 3 1 

6 

. 1 1 

2 

2 . . 

2 

4 4 2 

10 



Sampling distributions. 


20.221 

221 


are to be ignored. It is obvious for statistical reasons, as has been men> 
tioned above, that ki cannot appear in any of these formulae, and as it 
will be seen that the function of n involved depends only upon the con¬ 
figuration of the zeros of the partition of the second order, the necessity 
for this rule will become apparent. More generally, we may exclude 
any partition in which any set of rows is connected to its complementary 
set by a single column only. 

( 4 ) The usefulness of rule ( 3 ) for excluding superfluous partitions 
is extended by employing it in conjunction with the rule which holds 
when any column has only one entry other than zero; for in these cases 
we may introduce the factor n~^ and ignore the column concerned. For 
example, the partition pattern 

XXX 

X X . 

X X . 

irrespective of its numerical coefficient, is associated with a function 
of n which is one n-th of that associated with 

X X 
X X 
X X 

Moreover, such a partition as 


4 2 . 


. 1 1 

2 

. 1 1 

2 

4 4 2 

10 


is to be ignored (although every row has two entries) by reason of its 
connection with 

X , 

X X 
X X 

in which this condition is not fulfilled. 

With these criteria of rejection one may easily assure oneself that 
the three partitions set out above are the only ones which need be con¬ 
sidered in that case. 

( 5 ) To find, in general, the function of n with which any pattern 
is associated, we consider all the possible ways in which the rows can be 



20.222 

222 


B. A. Fxsheb 


separated into 1, 2, 3, ... separate groups, or separates. Thus with three 
rows we have one separation into one separate, with which is associated 
the factor n; three separations into two separates, with which is asso¬ 
ciated the factor n(n—1); and one separation into three separates, with 
which is associated the factor n(n—l)(n—2). In each of these five 
separations we count in how many separates each column is represented 
by entries other than zero. If in one separate, that column contributes 
a factor n“*; if in 2, 3, 4, ... separates, the factors are 

— 1 2 ! __ 

n(n~l)’ M(n--l)(w—2)* n(?? — l)(n—2)(n—3)* 

In applying this rule all patterns which are resolvable into two parts, 
each confined to separable sets of rows and columns, must be ignored. 
As an example, consider the five possible separations of the pattern 

X X 
X X 
X X ; 

the first supplies the term 

n _ 

n* 71 » 

the separations into two separates supply 

S7i(fi —1)_ 3 

7i'*(7i—D* n(n — l)^ 

while the separation into three separates gives 

47i(7T^l)(n—2) _ 4 

iy'*(7i—2)*^ 71(71—l)(n —2)' 

the total being n/{(n—l)(7i—2)}, the function appropriate to this 
pattern. 

It is equally easy to verify that the functions appropriate to the 
patterns 

XXX XX. 

XX. X . X 

X . X .XX 



Sampling distributions. 


20.223 

223 


both reduce to l/(n—1)^. The required coefficient is therefore 

72 , 224 _ 8(37n~66) 

(n —l)(n —2)“^(n —1)^ (w—l)“(n—2)’ 

as appears in formula 28. 

It will be obvious from the preceding section that the same rules 
must be applicable to multivariate problems, the only difference being 
that the column totals are then regarded as consisting of objects of two 
or more kinds. For example, to find the coefficient of in the 

/2 2 1 \ 

expression for (2 o j)» merely necessary to note that the second 

order partitions of the bipartite (55) corresponding to the three partitions 
of 10, used above, can be allocated in 20, 48, and 8 ways respectively, 
yielding a coefficient 

4(19n —33) 

(v — l)®(w — 2 )‘ 

Alternatively the contributions to the coefficient of the univariate formula 
may be each split up among the six coefficients by which it is replaced 
in the bivariate formula, giving in this case 

4 (19n—8 3) /C 33 icji -h 8 (1 l?i — 20) #C 88 'fao *^ 02 +^ (7 ^ — 12) <11 <Co 2 

4 - 8 (7 w — 12) /cji /fgo -h 2(5n—9) /c,, kIq -|- 2 (5n — 9) <c:j j 

in place of 8(37w — 65) 

In the same way the appropriate subdivision of the other bivariate and 
multivariate formulae may be obtained from an examination of the same 
set of two-way partitions, and it will evidently be s’.fficient for practical 
purposes to tabulate all the univariate formulae up to a given degree in 
order that all the corresponding multivariate formulae should be rapidly 
obtainable. 

The algebraic equivalents of a number of the more commonly occur¬ 
ring patterns are given on pages 223—226. 

Some useful patterns. 

Two rows. 

X X 1 XXX n—2 X X X X n^—3/1 + 3 

X X 71—1 XXX n{7i—l)^ X X X X n’*(n—1)“ 

— {l/(n--l)}, we have ^ ' * 


In general, if a = 



20.224 

224 


B. A. Fishbr 


Three rows. 


(n-l)(w-2) 


XXX , a 

6 « + 10 

("-DV— 


XXX „ 

n — 3 

XXX ,-^Ta 7 -o V 

(n—ir(n—2> 


XXX .XX . 

X . X EE X . X ;-^ 

XX . XX . 


V V ^ ^ n*-9«°+33«^-60w+48 
XXXX n(n-inn-%» 


^ ^ ^ ^ / Q\/ 2 A _1_A\ 

y y V y 0^ —3)(» —4n+6) 

„(„_!)» („_2)‘ 


XXXX a . . _ 

71 —4/1 + 5 

XXXX — -rrsT^-jr: 

^ ^ ir(/i—2) 


XXXX 


XXX. 


7i{n — l)®(/i — 2) 


X XX 
X X . X 
XXX. 


n — 3 
— 1? 


XXXX 
X . X X 
XX.. 


.XXX 
X . X X 

XX.. 


71 — 2 
fl{7l — lf 


XXXX 

..XX 

XX.. 




Four rows. 


X X 

X X n(?/ + l) 

X X (n—1)(7/ —2)(/i —3) 
X X 


XXX 

XXX —12//**+51//* — 74// — 18 
XXX (/i-l)*(/i-2)*(//--3)* 
XXX 


XXX 

XXX n^~8//*+]7n + 2 
XXX (n—!)*(« —2)*(// —3) 
X X . 

XXX 

XXX ’(/t —4) 

X . X (//-!)*(//-2)* 
X X . 


XXX 
XXX 
X X . 
X X . 

XXX 
. X X 
X . X 
X X . 


//* — 4.71 —1 
(/I—l)*(/i~2)(n—3) 


71 (71 — 3) 

(// —l)*(/t —2)* 


XXX 
X . X 
X X . 
X X . 


. X X 
X . X 
X X . 
X X . 


(n—If (n —2) 


X X . X XXXX 

X X . X _ X X . X n**—7n*+13// + l 
X X X . X X . . w(7i— If (« —2H//—3) 
XXX. XXX. 



X X 


20.225 


Sampling distributions. 


X X X X 
X X X X 
XX.. 
XX.. 


5n^-h7M-f 1 

n(n — — 2) (n — 3) 


X X X X 

.XXX w^~~8n^-j-23?t —24 
X . X X (n—1)*^ (?i —2)® 

XX.. 


.XXX 

X . X X 4-29/1 — 32 

X X . X (n-l)«(/i-2)« 
XXX. 


X . X X 
X X . X 
XXX. 
XXX. 


»“--7»+14 

(n-l)«(n-2f 


X X X X 
X . X X 
XX.. 
XXX. 


X X X X 
X . . X 

XXX. 
XXX. 


X X 

X X . X //^ — 

XXX . (//-l)"(n-2)“ 


xxxx xxxx 

XXXX _ xxxx //^ — 

. . X X ” X . X . (/i —D®(/i —2)=* 

XX.. XX.. 


xxxx 

.XXX n^—Bn-\-l 
X . X . (/i —l)*(n —2)“ 

XX.. 


.XXX 

X . X X — _ 

X X . X (/I——2)‘'* 
XX.. 


XXXX 
X . . X 
XX.. 
XXX. 


XXXX 
. , X X 
X X . X 
XX.. 


X . X X 
X X . X 
XX.. 
XXX. 

.XXX 
.XXX 
X . . X 

XX.. 


?t —4 

■!)»(//,-2) 


..XX 
X X . X 


X X . 
XXX 


// —3 

(/i-l)“(;i-2) 


XXXX 
X . . X 


xxxx 

..XX 
X . . X 

XX.. 


XXXX 
XX... 
X X . . ( 

..XX 

.XXX 
_ . . X X 

X . .X 
XX.. 


(w—!)*(// —2) 

.XX. 

^ . . X X _ 

— X . . X " 
XX.. 


.XXX 
X . X X 


.XXX 
XX.. 
X . X , 
X . . X 


..XX 
X . X X 
XX.. 
XXX. 


X . X X 
X . . X 
XX., 
XXX. 



20.226 

226 


B. A. Fisher 


Five and six row patterns. 


X X . 

X X . 


X X . 

X X . 


X X . 

X X . 


Ill 

••XX 
X X • X 
XXX- 

X X . 

n(n4-l) 

X . X 

X . X 

. X X 

< X X 
• • X 

< X X 

II 

(«—2? 

X . X 

XXX 

(n—l)'(n—2)(n—8) 


X X 
X X 
X X 
X X 
X X 


_ n Hn-\-5) __ 

(n — l)(7i — 2)(n —3)(n—4) 


XXX 
X X . 
X X . 
X . X 
. X X 


n(;i*--4n —1) 

(;i——8) 


XXX 
XXX 
X X . 
X X . 
X . X 


n (n®— 5n —2) 

(» — 1 (7t — 2)'^ {n — 3) 


XXX 
XXX 
X X . 
X X . 
X X . 


?i(n®—4n — 9) 

(h — — 2; (n — 3) {n — 4) 


XXX 

V V ^ n (ii^—94-19 n -f 5) 

^ ^ (n-l)^«-2)*(n-3)^ 

. X X 


X X . 

X X . 

X . X — Sn-{-2) 

X . X {n — iyHn — 2)Hn-Sf 
. X X 
. X X 


XX. XX 

XX. XX 

X X . M»(n4-1) X X n(;?4-l)(H^-f 15» —4) 

X . X (n—2)*(n—-3) X X (w—1 )(h—— 3)(n —4)(n —6) 

X . X XX 

.XX XX 


The general formula for the two-column pattern with r rows is easily 
found, by enumerating the separations into 1, 2, 3, ... separates, to be 

£ (p-D! AP(00 

p=i p n(n—1)... (n-p-l-l) * 

where A*’(0’) stands for the leading p-th advancing difference of the 
series 0^ 1', 2^ .... 


10. Demonstration of the combinatorial method. 

To demonstrate the validity of the rules which have been stated, 
it is useful to consider in what manner the generating function M will 



Sampling distributions. 


20.227 

227 


be modified by a functional transformation of the variates. In the case 
of a single variate x we have the function 

the coefficients of which give the mean vmiuMA of all powers of x in the 
population. By what operation should the function Af be transformed 
so as to give the corresponding function appropriate to a new variate 
which is a known function of a; ? Suppose that 

£ =/(^) = Co-f-Cia;4*Caa;®-f .... 
then the mean value of £ is 

which may be written 

Co+c,f^M+c^M+..., 

or 

where t is made to vanish after operation. 

Moreover, the mean value of the r-th power of £ will be given, at 
least formally, by the equation 

and the new generating function, 

M' = l+yu:T + /*s|^ + ..., 

may be written 

in which the operator is supposed to he expanded in powers of d/dt 
before attacking the operand. 

The corresponding relationship for simultaneous variation is easily 
found. In such cases M will be a function of two or more variables 
ti, tj, ... corresponding to the variates a?, y, ... ; the new variates will be 
given functions of the old 

£2 =/ 2 (^. y* •••). 



20.228 

228 


R. A. Fisher 


and the operative expression fot the transformation of M is 

M' = • M, 

To apply this result to univariate sampling problems, consider the n 
observations of the sample as our n original variates, and the symmetric 
functions /ci, ka, ... as the new variates the generating function of which 
is required. Then, considering first the operand, for the first observation 
X, the fj. generator is where K is the k generator of the population 

sampled, i.e. 

K{t) — 


Moreover, since the n observations are independent, their simultaneous 
K generator will be merely the sum of the individual generators, so that 
our operand is 

exp |xiSi+«c2^ + ...|, 

n 

‘in which Sr == S (tp. 


We may note at once that the coefficient of icjj • in the operand is 

_ 


The ]UL generator of the simultaneous distribution of the k statistics 
will be given by the operator 

gfi fci+»-a +•••+»•„ 


in which is interpreted as the same function of dldh, dfdta, ... as 
the corresponding k statistic is of Xi, xg, ..., x„. The property by which 
these statistics were defined, namely that the mean value of k^ should be 
Kyf is now seen to imply that 


but 



0 , 


where (vi, v^, ...) is any partition of v. If, for example, the partition 
is of two parts, 

» n{n + l) 



Sampling distributions. 


20.229 


in which t and V are different members of the set tu it follows 

that hv must contain, in addition to the simple term 



terms for all two-part partitions of the form 


except when vj = vj, when each operator finds two terms on which it can 
act, and its coefficient is therefore to be halved. Thus, if we write 




1 ) ...(n—p-f-l) 



where p = ;ri-|-7r3-|-... and etc. are any selection of p out of the 
n variables t, the summation being extended over all such selections, 
then 


K 


= 2 


_2J_ 

(pi !)'••»■,!(Pa!)’»7ra! ... 


g{pVpV---), 


the summation being taken over all partitions of p. 

This structure of the k operator makes it possible to think of the p 
acts of differentiation in each operator as p separate objects, the partitions 
of which, represented hy the g operators, occur each in as many w’ays as 
the objects can be arranged in that partition. We may thus use a two- 
way partition to assign how many of these operations are effective against 
each of a series of factors constituting the operand. 

Let now this operand product be expanded in a number of terms of 
the form 

z(a, 6, c) = 


the summation being taken over all the n(n—l)(n—2) different ways of 
selecting t\ and t" from among the set ti, ..., TAt/i«rWill then 
be a 2 ; term for every possible separation of the partition (S’?* 9 }*...) into 
one or more separates. For each two-way partition chosen all these 
separations will contribute to the result, and with the same numerical 
coefficient, apart from that contained in the g operators, equal to the 
number of ways of allocating the objects in the two-way partition. 



20.230 

280 


B. A. Fisher 


The number of terms in the z corresponding to any separation into 
a separates is n(n—1) ... (n—a+D* and this; combined with the factors 
in the g operators, gives the functions of n corresponding to any two- 
way partition according to rule (6). There remains, however, in M' a 
number of terms corresponding to two-way partitions, in which the 
columns may be divided into tw'o classes, each confined to different 
sets of rows. These introduce terms of a higher order in n, which are 
obliterated when we find K' = log M', for in these cases the additional 
term in M' will be of the form AB, where both A and B occur also as 
other terms in M'. 


n. Measures of departure from normality. 


The statistical inefficiency of moment statistics from distributions 
differing widely from the normal, except when they are of a special type 
[ 10 ], much reduces their practical importance for curve fitting ; but since 
they are fully efficient for the normal distribution, they provide an ideal 
basis for testing if an observed sample indicates a significant departure 
from normality in the population sampled. Significant asymmetry 
should, in the first instance, be shown by an excessive value of fes; but 
since the variance of the population is usually unknown, but may be 
estimated from the value of observed in the sample, the test of signi¬ 
ficance will usually involve not the distribution of ks, the moments of 
which are given by such formulae as (4) and (20), but the distribution 
of the ratio Since, for the normal distribution, the variance of 

ki is given by 


<(3*) = 


fin 8 

(n-~l)(n-2) 


it will be convenient to show how the moments of such a statistic as 


* = \/( 


in—Din- 
fin 


• 2 )' 






may be expressed in terms of the general tcip^^p^* ...), for all distribu¬ 
tions, and its particular value obtained for the normal distribution. This 
may be done by expanding the factor kf^ in the form 






tC2 ) 


whereupon, in virtue of the expressions connecting the moments p with 
the semi-invariants ic, and the mean value of x being zero, we can at once 



Sampling distribution6. 


20.231 

231 


write down the following expansion for its variance : 

/»*(*) = K^ix) = {«r(3»)—|■-c(3>2)^-^ ^(3»)-r(2“)+*(3>2*)t 

- ^ 13*(3’'2) *-(2*)+,r(3“) *(2“)} + {3.r(3*) <r*(2») [ |, 

#^2 ^a. f 

in which, remembering that a k of p parts involves terms beyond 

n~^ have been omitted, as well as the terms of odd degree which vanish 
for symmetrical distributions. 

Similarly we have 

Mx) = |3-c’(3»)+k( 3*)—^ l6<c(3»2) *(3»)+*(3^2); 

Q-f 

+ =i { 3«*(3»)»(2*) + 6-c»(3* 2) 

+*(3‘) <c(2‘') + 6(c(3*2*) k ( 3 *) f 
— 5| j 18/c(3*2)<c(3*)K(2’) + 3(r®(3“)<f(2*)} 

K,, 

+ i9*‘'(3*)-c»(2»)l} 

Kq I 

and 

/««(*) = |l5<»(3»HJ5*(3‘)*(3») + <(3») 

— — \ 45^(3'' 2) *’■(3’’)+15*(3* 2) *(3’)+16«c(3<) k(3> 2)} 

^2 

45 

+ ^ (15«c»(3’')*(2“) +90<t’'(3’2 )(c(3’')+46k(3*2*)<c»(3*) 

Ktt 

+ 15-t(3^)/f(3’)*(2’)} 

— {15/c“(3’) «:(2®) + 136 k(3’2) /c’‘(3®) *(2*) 

+ ^ {45k»(3»)<c“(2»)1|. 

From these moments of the distribution of sc, the semi-invariants K^ix) 
and Keix) may be obtained by means of the relations 

3K’2(^)f 



20.232 

282 


B. A. Fishbr 


giving 


Ktix) = - 'f(3’2)*(3») + ^ *»(3»)/t(2») 


- |-(3‘2) + ^ **(3*2)+|r »(3‘)*(2») 

+ ?? *(3* 2’) *(3*) — ^ *(3* 2) *■(8*') /t(2“) 

- k»(3») /c{2») + ^ **(3») *’(2»)} 

and 

*r,(x) = ^”"^16,1”^^^* {-c(8«)-^ «c(3‘2)k( 3»)-^ /c(3‘)<r(3»2) 

+ *''(3*2)/f(8“) + ^ *(8*2»)/c’'(8») 

kTj, 

+ ^ <c(3*) <c(3’)<c(2’)— ^ **(3*) /c(2") 

Kt K?2 

—5^ k(,S’‘ 2) (c’'(8“) /t(2“)+ *®(3*) k’‘(W: . 


while no higher semi-invariants contain terms involving only n“‘^. 

The formulae tabulated give all the values required for K^ix ); thus 
for samples from the normal distribution 


— n—1 ” («—l)(/i —2) 


»», *(2*) = 


8 


*^8’2) = ,f,*(3’‘), ,f(3»2*) = 4 “ 

and substituting these values, we find 

/ ^ 1 6 ^ 22 
K ,( X ) = 1- — + ^. 


Ut—if 
(3’), 


5 *2' 


To evaluate k 4 (x) we need in addition 


*(3*) = 


648(6»—12)«* , 
(« —l)*(n —2)* **’ 


and the leading term in #f(3* 2); this latter only requires the enumeration 
of the number of ways of building up two-way partitions of (3^2) with 
row totals (2^), or the number of ways of connecting up the symbolical 



figures 


Sampling distributions. 


20.233 

288 



which can he done in 1555Z, 7776 cuncf 15552 ways respectively, showing 
that fc(3^2) from normal samples is approximately 38880n“^. 

With this value, that of k 4 (x) is evaluated as 

, , 36 1296 

- • 

n n 

Finally, for Kr,(x) the only new k required is k( 3®), involving the figures 
having six points from each of which three lines radiate :— 





which supply a contribution of 47520n~* to Ke(x), or, 
terms, lead to the value 




16120 


with the other 


For the practical aj)pIication of the function x in testing asymmetry 
we shall now require to construct a function of x which, as far as terms 
in is distributed normally. Putting 

where ^ is normally distributed with unit variance, it is easy to obtain 

= /3«+6a», 

/r.U) = 24^5+216/3“ 

K^(x) = 720/8“ i|+8240S‘<S“, 




20.234 

234 


B. A. Fisher 


which are satisfied by 





J_ 


S = 




or, invertinp^ Die relation between ^ and x, we have 

^ = X (1+f (l-^)(.-- 3 *)-|| (x‘-10x»+16x). 

This translation formula makes it possible to assess the numerical 
effects upon tests of significance of the actual distribution; Tables 2 and 3 
show the values of various possible formulae for the test deviate in the 
region, important for tests of significance, ac = 1 • 8 to 2*2, and indicate 
that these effects are very serious. 


TABLE 2. 

Comparison of deviates in five formulae for testing asymmetry, 
n = 100. 


(«) 

(*) 

(c) 

(d) 

(«) 

\/(} 


X 

b 

(3 

1*7999 

1*8274 

1*8 

1*8476 

1*8603 

1*9999 

2*0306 

2*0 

2*0300 

2*0686 

2*1999 

2*2835 

2*2 

2*2063 

2*2630 


TABLE 

3. 



Comparison of deviates in five formulae for testing asymmetry. 


n = 50. 



(«) 

(^) 

(c) 

(d) 

(e) 

\/^ 


X 

(i 

(3 

1 •7990 

1*8658 

1*8 

1*8960 

1*9463 

1*9996 

2*0620 

2*0 

2*0600 

2*1746 

2*1996 

2*2682 

2*2 

2*2106 

2*4016 


In this region an error of 0 1 in the deviate produces an error of 
about 24 per cent, in the probability deduced, and, although high 
accuracy in the latter is not a necessity, little reliance can be placed upon 
tests when the deviate may be biased by as much as 0*2. Of the 
formulae tested, the formula (a) in terms of crude moments is almost 
equivalent to the use of x, and these are evidently the most in error. Of 
the simple formulae (b) is least in error, and for samples of 100 this error 
is only about *03. The value shows the effect of using terms of the 
first degree only in the translation formula, while shows the effect 
of using also terms in n’*. There is evidently little to be gained by 
using instead of the simple formula which latter gives 



SAMPliINO DISTRIBUTIONS. 


20.235 

285 


apparently the better values for deviations exceeding 2 *0. For samples 
as small as 50 the fully corrected value $2 is evidently required, and in 
view of the uncertainty of the effect of the omitted terms in n~*, etc., 
no reliable test of normality for materially smaller samples can be said 
to be available. As in so many other cases, the adequate treatment 
even of moderately small samples is not well approached hy series in 


12. Thr significance of the fourth moment. 
The sampling variance of ki from a normal sample is 


24«(w-fl) 4 

(n— 1 )(« — 2 )(« —3) 


in testing the significanc-e of such a value, we should therefore naturally 
calculate 


= v(' 


(n —l) 0 t— 2 )(yi~ 8 y 

24n(w-f 1) 




as a variate which, with increasing sample number, tends to be normally 
distributed with unit variance. With finite samples the distribution is 
asymmetrical, for k( 4*) is not zero, 'fhe true mean value of x is zero, 
for with a normal distribution k(42^') is zero for all values of p, whence it 
follows that the mean of is zero independently for all values of k^. 

Tlie mean value of is easily expanded in the form 


(n—-l)(n — 2 )(a —3) 
2471 (;t-H 1) 


]*-(4'‘)- — «-(4''2)+ *(4'-^ ,f(2“)+... [ 

1 K 2 K.j > 


or 


32 20 

?i—l n — 1 


as far as n~^. 

The mean value of as far as is 


f (7l-l)(7t-2)(71-3) ]g 
[ 2471(71-1-1) /cj { K2 


«:(4®2)-f 


^1 

^2 


*:(4®)k(2'') + ...|. 


Now ic(4*) has been evaluated by the direct combinatorial method, 
giving (formula 67) 


_ 1728«(n+l)(«‘-5n+2) , 


1728 

^-3 


(^ 1 + 8 ); 


or, as near as needed, 



ao.236 

286 


B. A. Fishbr 


while the leading term of k( 4*2) is 1728n“®Xl2, giving, as the mean 
value of a;*, 


{ j l f • 6 V6{n+8-72+42}. 


or 



Next, the mean value of is, as far as n“*, 


f (n—l)(7i~2)(n-^3) )® 
\ 24n(n+l)4 J 


8«:*(4‘^4-<f(4*) 


—16/c(4») *(4“ 2) f + ^ i 8*»(4’) *(2’') (}; 


whence, subtracting three times the square of the mean of x®, there 
remains 


K^{X) = 


( (n^l)(n-2)(n~8) )® 

i 24n(n+l)*^J / 



/c(4®)^(4®2)-f-;^ 



The leading term in k( 4*)~ 576^5 comes to 636n”®, to which the other 
terms add —192 and -4-96 respectively, leaving 


K^ix) : 


540 

n 


For the mean value of x® we shall need 


(^)* {lOK(4>)<c(4”)+<f(4') —^ 10 k(4*2)k(4*) 

+ 10«r(4’') k(4“2)4-^-c(4“)ic(4*)<c(2“)}, 

K.^ ) 

whence, deducting lOKa(x) , ksCx), there remains 

»(4»2)-c{4*)+^ *(4»)<c(4»)*(2«)l. 

ICjj IC2 / 

or ^ . 144 V6. 

n* ^ 


Now if £ is normally distributed with unit variance, and x can be 
expressed approximately in the form 

» = ;8f+y(^*-l)+5(^’-8f)+.(f‘-6f+8)+..., 



we have 


Sampling distributions. 


20.237 

237 


Kiix) = Ml (a:) = 0, 

Kiix) = Ma(a;) = /^4-2y«H-6«5‘»-f24e»-h.... 

~ M3(^) ~ 36jdy<5-f"..., 

M4(j:) = 3/9"H-24a®^+48,8=*y» 
whence k^{x) = 24^8” 48/5® y®, 

and M6(^^) = 60/8V+120^*64-1080/5®y(5H-680/8®y8, 

whence /cgU) = 120/5^€-f 7208®y^+5(>0/5®y'; 

and equating these to the actual values, neglecting n“^, we have the 
translation formula 


or, inversely, 

+ 4(^-8x)+5^^ife--to'+8). 


Summary. 

The equations which connect the moment functions of the sampling 
distribution of moment statistics with the nioinent functions of the 
population from which the samples are drawn correspond in univariate 
problems to all the partitions of all the natural numbers, and in multi¬ 
variate problems to all the partitions of all multipartite numbers. Very 
few of this system of equations have hitherto been obtained owing to the 
algebraical complexity of their direct evaluation. The formulae are very 
much simplified (i) by using the semi-invariants instead of the moments 
of the population, and (ii) by using the system of moment statistics, the 
mean sampling value of each of which is the corresponding semi- 
invariant. The relations which necessarily exist between the different 
multivariate formulae demonstrate that all of these, as well as the uni¬ 
variate formulae, must be derivable from a system of rules associating 



20.238 

288 


Sampling distributions. 


different two-way partitions of multipartite and unipartite numbers with 
corresponding functions of the sample number n. 

Buies are given and illustrated which enable any terra of any of 
these formulae to be obtained directly from an examination of the appro¬ 
priate partition. Their general validity is demonstrated by a theorem 
which connects the moment generating function of any distribution with 
the corresponding function of any functionally related set of variates. 
Complete univariate formulae are given up to the tenth degree, and some 
new results are applied to the theory of samples from a normal popula¬ 
tion. 


References. 

1. W. F. Sheppard, “On the application of the theory of error to cases of normal distri¬ 

bution and normal correlation ”, Phil. Trans. (A), 192 (1897), 1&1-167. 

2. K. Pearson, “On the probable errors of frequency constants'*, Biometrika, 2 (1902), 

273 281. 

8. K. Pearson, “On the probable errors of frequency constants”, Biometrika, 9 (1913), 1-10. 
4. L. Isserlis, “On certain probable errors and correlation coefficients of multiple frequency 
distribution with skew regression”, Biometrika, 11 (191C), 185-190. 

6. L. Isserlis, “Formulae for determining the mean value of products of deviations of mixed 
moment coefficients in two to eight variables in samples from a limited population”, 
Biometrika, 12 (1919), 183-184. 

6. H. F. Soper, “On the probable error of the correlation coefficient to the second approxi¬ 

mation ”, Biometrika, 9 (1913), 91-116. 

7. “ Student ”, “ The probable error of a mean ”, Biometrika, 6 (1908), 1-26. 

8 . A. A. TchouprofT, “On the mathematical expectation of the moments of frequency distri¬ 

butions”, Biometrika, 12 (1919), 140-169 and 186-210. 

0. A. F. R. Church, “ On the moments of the distribution of squared standard deviations for 
samples of n drawn from an indefinitely large population”, Biometrika, 17 (1925), 
79-83. 

10. R. A. h'isber, “ On the mathematical foundations of theoretical statistics ”, Phil. Trans. 
(A), 222 (1921), 809-368. 

11. J. Wishart, “A problem in combinatorial analysis giving the distribution of certain 
moment statistics”, Proc. London Math. Soc. (2), 29 (1929), 309-321. 



21.15a 


2 I 

THE MOMENTS OF THE DISTRIBUTION FOR 
NORMAL SAMPLES OF MEASURES OF DEPAR¬ 
TURE FROM NORMALITY 


AUTHOR’S NOTE 

In the previous paper (Paper 20) it had been shown that mean pow¬ 
ers and other symmetric functions of the sampling distributions of 
the /c-serics of symmetric functions of the observations could be ob¬ 
tained from the combinatorial properties of certain bipartitional 
functions, thus reducing to direct arithmetic the very heavy algebra 
by which this type of problem had ordinarily been treated. Equiva¬ 
lent properties are shown in this paper to be possessed by the ratios 
of such functions to the powers of the same degree of the estimated 
variance, k^, in samples from a normal population. We thus find 
exact formulae in place of the approximate series arrived at in the 
earlier publication. Section 11 of which is here superseded. The re¬ 
currence relation of Section 2 and the use of symbolic operators in 
Section 5 may well have applications beyond the immediate problem. 


* Reprinted from Proceedings of the Royed Society^ Series A, Vol. 130, pp. 17- 
28, 1930. 



21.16 


The Moments of the DistHhution for Normal Samples of Measures of 
Depcurture from Normality, 

By R. A. Fisher, Sc.D., F.R.S., Statistical Department, Rothamsted 
Experimental Station, Harpenden, Herts. 


1. The Appropriate Symmetric Functions of the Observations, 

If a?! ... are the values of a variate observed in a sample of n, from any 
population, we may evaluate a series of statistics (k) such that the mean value 
of will be the pth cumulative moment function of the sampled population ; 
the first three of these are defined by the equations : 


A4 = is(x). 

n 


k^ 


_n_ 

(„ - 1) (« - 2) 




then it has been shown (Fisher, 1929)’" that the cumulative moment functions 


* R. A. Fisher, Moments and Product Moments of Sampling Distributions,** * Proc. 
Lond. Math. Society,* Series 2, vol. 30, pp. 199-238 (1929). 



21.17 


17 R. A. Fisher. 

of the simultaneous distribution, in samples, of k^t ...» may be obtained 

by the direct application of a very simple combination procedure. 

The simplest measure of departure from normality will then be 

Y = kJc^-^\ 

a quantity which is evidently independent of the units of measurement, and 
in samples from a symmetrical distribution will have a distribution sym¬ 
metrical about the value zero. In testing the evidence provided by a sample, 
of departure from normality, the distribution of this quantity in normal samples 
is required. 

Hitherto the exact values of the moments of this distribution have been 
unknown, though a method of calculating the moments for large samples, 
in a series of any number of powers of has been given. It will be shown 
that the distribution may be investigated by means of a recurrence relation, 
which yields the moments of the distribution and seems well adapted for the 
investigation of its other properties. 

2. The Itecurrence Relation for y. 

For values of p from 1 to n — 1, let us define 5^ by the relation 

5. = — *l) + —^ (®n — *l). 

n — 1 

or, 

n — 1 

Then evidently 

"s’ (5,) = 0 

and 

S (X, - = (x„ - (5*^ 

while 

S (X, - k,f = (x„ - k,Y (1 - ^ (*, - h) S (5») + s (5»), 

SO that, if 

n 


n — 1 


(a:„-^j)* = cot*0.S«*), 



21.18 

Moments of Distribution for Normal Samples, 

we may express the ratio 


18 


Yn = 

in terms of the ratio 


ly/n - 


-S(a; —jfci)* ~ SJV2 (a; — 


Y„-, = s (^») ss/* (^*) 

W — o 

in the recurrence relation 

= (n-y”«(«-2r ® ® ° 

where Yn-i value of y calculated from the sample values excluding x„, 

and Yn i® value calculated from the whole sample of n values. 

The value of the recurrence relation in this form lies in the fact that the 
distribution of 0 is independent of that of Yn-i» whatever may be the values 
of ^ 1 ,^n-i -i- (^*), if a be the standard error of the population sampled 

the distribution of 


will be 

hence if 


t == {x„ — hi)V n/n — 1 


<y \/2Tt 

c = t/Vs, 

where S stands for S (^®), since the distribution of S is known to be 


d/=. 


(2t,2)*(n-2) Vl: 
the distribution of c will be given by 




i/=^.i22a_ 

<T v27c n — 


n — 4 , 1 

2 ' Jo 


n — 3, 


dc 


~ ! 'y/n (1 + c*) ® 


or, if c is cot 0, the distribution of 0 is 

n ~ 3 
df 


2 


n — 4 


n’*-® 0 d0. 


1 \/te 


( 2 ) 


independently of the value of Yn—i* ^ indeed is obvious if the sample is con*> 
sidered geometrically. 



19 


R. A. Fisher. 


21.19 


3. The Distribution for Samples of 3 . 

The terminal values of are given by putting 0 == 0 and n, when y„ == ± V» 
irrespective of the values of Yn-i which is indeed indeterminate at these values. 
The recurrence relation enables us also by means of a single integration to 
obtain the distribution of from that of Yn-i> alternatively to obtain the 
moments of the distribution of y„ in terms of those of the distribution of y*_j 
T o utilise the recurrence relation in these ways we shall need the distribution 
of Y for the smallest possible samples, i.e., for n = 3 . 

When n = 3, we may represent the 3 deviations of the observations from 
the mean of the population by 

a, = 6 + o cos *2 = * + <» cos x, = 6 + o cos , 

then the mean of the sample is 6 , and the statistics Ajg and are given by 
*2 = l’ {cos* 4, + COS»(^ + y) + + y)} 

= ia* 


but 

hence 

and 


^3 — ^ 4 - , 

cos* ^ — J (cos 3^ + 3 cos ^), 
k^ — Y oos 3^, 

Y = = \/3 cos 3^. 


For the sampling distribution of since 


and 

we have 


0 (x,, Xg. X 3 ) 3 Vs 

3(6, 2 


* 1 * + I,® + Is® = 36* + ^o». 
1 r*® 


whicli on integration with respect to 6 yields 

df= ..: u . 

(aV27Tl®Jo 2 


3a» 

e da . 


(a V27 t)® Jo 

and on integration with respect to a, yields simply 



21.22 


Momenta of Distribution for Normal Samples, 


22 


but if 


then 


, _ (n - 2)^ (n + l)(n + 3) (n + 5) (n + 7) (n -f 9) , 

1 ) 

«’»' — ®»+i' = (2n« + 23n» + 2n* — 237n + 70) 

n* (n — 1)* 


80 that 


= 108 («* + 24n + C + ^ , 

where C is to be determined from 


so that 


C = — 149, 

u>„’ = — y(TO — 2) ^ 


and the fourth moment of y is given by 


1^4 (Y) = ««' 


_ 108n« (n — 1)^ (u* + 27w — 70) _ 

(n — 2)» (n + 1) (n + 3) (n + 6) (n + 7) (n + 9) 


Similarly the sixth moment is found to be 


(y) = 


3240n» (n — 1)* (n« + 84n» + 2696»» — 15168n + 20020) 
(n — 2)» (n + 1) (n + 3) ... (n + 15) 


and the same method may be applied to determine the higher moments. 

From the moments the cumulative moment functions may be determined 
by the invariable relationships, which for 83 anmetrical distributions become 


k 2 = p.* 

<4 = 1^4 — 3(^1* 

«4 = 1^4 — 15(1.,|X« + 30(i,*, 

which give us the values 

^ 6n (n — 1) 

(n - 2) (n + 1) (« + 3)' 

1296n» (n — 1)* (n — 7) (n* + 2n — 5) 

(n — 2)» (n + 1)* (n + 3)* (n + 5) (« + 7) (n + 9) ’ 

_ 466560w» (n—1)» (7n«—88n»—286n«+3284n»+1667n«—22108n+20020 ) 
(n—2)* (n+l)» (n+Sy (n+5) (n+7) (n+9) (n+U) (n+13) (n+15) 



23 


R. A. Fisher. 


21.23 


from wldch we may derive the ratios, 


J_ ^ ^ -3 ^ 3 (7n« — 88nS — 286n* -f 3284w» + 1667n^ — 22108n + 20020^ 

6 ! * * (n — 2)* (n H- 6) (n -h 7) (n -f 9) (n -f H). (n + 13) (n + 16) ' 

which determine the rate of approach of the distribution of y to normality 
as the sample number n is increased. It will be noticed that changes from 
a negative to a positive sign at n — 7, and that the corresponding ratio rises 
to its greatest value about 0*024 at n — 22, while the corresponding ratio for 
starting from positive values has a negative maximum about — 0*0016 at 
n = 8, is positive again at n = 13, and reaches a positive maximum about 
-f- 0*0027 at « = 32. Using the reciprocal of n as abscissa the course of these 
two ratios is shown in figs. 1 and 2. 



Fia. 1.—Graph of the Ratio ! of the distribution of y = 


77 = 


3 (n — 7) (n» -f 2n — 6) 

2 (n - 2) (n + 5) (n + 7) (n 9) 



21.24 


Momenta of Diatrihution for Normal Samples. 


24 



6. The Moments of the Simultaneous DistribxUion of Different Measures of 
Departure from Normality. 

It is obvious that the method of approach adopted in the foregoing sections 
is applicable to the determination of the moments of the distributions of, or 
more generally of the simultaneous distribution of, all measures of departure 
from normality such as 

S = 

c = 

and so on. 

For S and e we find the recurrence relations comparable with that already 
found for y, namely 


n(n — 3) 
(n — 2)^® (n — 





21.25 


25 

t„ = ""e 


= f, c»<i» 1- 


R. A. Fisher. 

. .4 lOn (n 


n — 2 


1 ) 


(w- 


2)^''^ (w - 1)1/2 
n2 (w* 


('■•■-.-Tl)''- 


_ (to + 5 ) ^ g __ _ 

(n — 2)» (w — 2)»'» (n — l)'/a (n + 4) 


25) i 


e„_i. 


and by a mere repetition of the algebraic processes employed above, we may 
obtain a recnrrence relation for the mean value of any expression of the form 

Y“SV, , 

from which the mean value in question may be derived. 

If, in accordance with the notation employed for the designation of the 
moments of the set of statistics ifcg, k^y ..., we represent such a mean product 
by 

5M‘3«2'0. 

where 

2r == 3a -i" 46 -{- 5c -j- ..., 


so that r is always an integer save for the odd moments which necessarily 
vanish, we may list the following formula?: 


(JL (32 2-2) = 
[I (42 2 - 4 ) = 
[I (4 322-5) -- 
|x (52 2-5) = 
[I (34 2“«) = 
fx (42 2-5) ^ 
{X (3« 2-») = 


6n (n — 1) 


(n-2) (n + l)(n4-3)’ 

_ 24n {?i — 1)2 _ 

{n — 3) (w — 2) {n + 3) (w 5) ’ 

_ 216n2 (n — 1)2 _ 

(n - 2)2 (n I- 1) in + 3) (w + 5) (« + 7) ’ 

_ 120^2 jn -I- 5) (n — l)^ _ 

(n — 4) (n — 3) (n — 2) (w + 1) (n + 3) (n 4- 5) (n + 7) ’ 

_ ]08n2 (n — 1)2 (n2 + 27n — 70) _ 

(n - 2)2 (n -I 1) (n -|- 3) (n + 5) (n + 7) (.t f 9) ’ 

_ 1728n (n — 1)2 (n^ — 5n + 2) _ 

(w — 3)2 (n — 2)2 (w + 3) (w + 5) (n + 7) (n -f- 9) ’ 

3240n3 (n —1)3 (n^ -f 84n3 + 2095^2—151 fi8n 4 20020) 

(n—2)5(w4 1) («4-3) (w-f 5)(n4-7) {n+9) (w4-11) (n 4 13) (n4-15) * 


A comparison of these formula) with those already given (Fisher, 1929), for 



31.26 


MonieiUs of Distribution for Normal Samples. 


26 


the cumulative moment functions of kg, kg, kg, which in every case but |x(3*) 
and (t(3*) are also the moments of the distribution, shows that 


|x(3*) = 
|x(4») = 
(x(43«). 


6 n 


(„_!)(„_ 2 ) 

_ 24n (n + 1) _ 

' (n — 1) (n — 2) (n — 3) 

216n» 


,r * 


(» — 1)» (n — 2)» 


^(6*) = 
(^(4“) = 


120n* (n + 5) 


(» — 1) (n — 2) (n — 3) (» — 4) 
1728n (n + 1) (w» — 5n + 2) , 


(n — 1)* (n — 2)* (n — 3)* 


Moreover 


(x(3«) = k(3*) + 3k»(3*) 

^ f 648«« (5« — 12) 


108n» 


- 2 )»)' 


l(n — 1)» (» — 2)* (n — 1)* (n — 2)' 

108n*(n’‘ + 27n — 70) , 

(n — 1)» (n — 2)» “V 


and 

(t (3«) = K (3«) + 16« (3«) K (3*) + 16*e» (3*) 

_ » f I44(22n»- ni.«+ 142)n» 

® I (« —lf(n —2)» 


18 (6n — 12) n* 
(n — 1)« (n — 2)« 

+ 


n* 


_1 

(n - 1)» (n - 2)»J 


^ 3240n» (n* -f 84n^ + 2695na — 15168w + 20020) « 

(n — 1)® (n — 2)® ’ 

In every case, therefore, the moment of the distribution of y» c» 
derivable by multiplying by 

_ (n-ir _ 

(n ~ l)(n + 1) ... (n 4- 2r ~ 3)ic/ 


the corresponding moment of the distribution of •••• Since many of 

the latter moments may be found relatively expeditiously by means of the 
combinatorial procedure, this will be the quicker method for the more complex 
product moments. For moments of high degree, however, such as (3®) and 
(3^®) it does not seem easy to enumerate with certainty all the combinatorial 



21.27 


27 


K. A. Fisher. 


patterns, and the recurrence method, though necessarily heavy, supplies a 
valuable check. 

An analytical proof of this relationship, or at least analytical grounds for 
accepting it as general, may be found by the method of transforming the 
characteristic function previously employed in demonstrating the rules of 
the combinatorial method. If 

M(<i, <2, ...) 

is the characteristic function of the simultaneous distribution of the variates 
X|, Xg, .., and if 

M' (Ti, Tg, ...) 


is that of variates ^g, ..., defined in terms of Xg, ..., bj*' the relations 

=/i (^» •••)» 

^2 ~/ 2 (^> ® 2 » •••)* 

then 

(tj, Tg, ...) = • M («!, «2, ...) 


at ----- 0, /g — 0, where/p in the index stands for 

To apply this theorem to the present case we utilise the fact that in sampling 
from the normal distribution is distributed independently of y, 5, e, ..., in 
the known distribution 


of which the characteristic function, 


i(n-l) 


k^Un-3) ^-Un-l)kJ.,dkt, 


Hence the general characteristic function of the simultaneous distribution 
of Ajg, Yf S, ..., is of the form 


where M is the sum of terms such as 

. 



• 21.28 


Moments of Distribution for Normal Samples, 


28 


and from this expression we must be able to derive the function 

M' (Ta. T4, ...) = S (i. (... 5« 4‘ 3-) ^ ^ 

a \ 0 1 Cl 

by the action of the operator 

gT,D,D,®'*+T4D,D,®+ ... 


where stands for dfdt^. 

It appears, therefore, without discussing what meaning should be attached 
to the fractional indices, which find in fact only zero terms on which to operate, 
that 

(i (... 6" 4* 3“) = (A (... 6“ 4‘ 3“ 2-’-) ■ 

otg \ n — 1/ 

at <2 — 0, or that 


(i (... 6' 4? S”) = (ji (... 5” 4? 3» 2-’-) . 


(n + 2r — 3) 


(n - ir 


which is the relationship required. 


Summary. 

Two methods are given for discussing the distribution of the ratios of the 
symmetric functions ..., obtained from samples from a normal distri¬ 

bution to the powers of k 2 of the same degree. 

The first method consists in the development of recurrence relations expressing 
the ratios from a sample of n in terms of the corresponding ratios from a sample 
of n — 1 observations, and of a parameter distributed independently in a known 
distribution. Theoretically all properties of the general distribution could be 
obtained from these relations in conjunction with a study of samples of 3, 4, 
5 ... observations. 

The relations are used to derive the exact values of the first three even moments 
of the simplest ratio y, and of the simpler non-vanishing moments of the 
simultaneous distribution of all the ratios. It is observed that these moments 
are very simply related to the corresponding moments of the distribution of 
^ 3 , ^4 .. given in a previous paper. 

The second method is an application of the method of symbolical operators 
developed by the author, which confirms the generality of the relationship 
found. The moments of the one distribution may thus be inferred directly 
from that of the other for which the combinatorial procedure is available. 



22.527a 


22 

INVERSE PROBABILITY '^ 


AUTHOR’S NOTE 

This short paper to the Cambridge Philosophical Society was in¬ 
tended to introduce the notion of “fiducial probability,” and the 
type of inference which may be expressed in this measure. It opens 
with a discussion of the difficulties which had arisen from attempts 
to extend Bayes' theorem to problems in which the essential informa¬ 
tion on which Bayes' theorem is based is in reality absent, and passes 
on to relate the new measure to the likelihood function, previously 
introduced by the author, and to distinguish it from the Bayesian 
probability a posteriori. 

It is emphasised that statements of equality (exact statements) of 
fiducial probability can only be derived from statistics having con¬ 
tinuous distributions. It should also have been emphasised that the 
information they supply as to the unknown parameter should be ex¬ 
haustive. Only the case of a single parameter is discussed. The im¬ 
portance of the paper lies, however, in setting forth a new mode of 
reasoning from observations to their hypothetical causes. 


Reprinted from Proceedings of the Cambridge 
XXVI, Pt. 4, pp. 528-535, 1930. 


Philosophical Society ^ Vol. 



22.528 


Inverse Probability. By^R- A. Fisher, Sc.D., F.R.S., Qonville and 
Cains College; Statistical l>ept., Rothamsted Experimental Station. 


I know only one case in mathematics of a doctrine which has 
been accepted and developed by the most eminent men of their 
time, and is now perhaps accepted by men now living, which at the 
same time has appeared to a succession of sound writers to be 
fundamentally false and devoid of foundation. Yet that is quite 
exactly the position in respect of inverse probability. Bayes, who 
seems to have first attempted to apply the notion of probability, 
not only to effects in relation to their causes but also to causes in 
relation to their effects, invented a theory, and evidently doubted 
its soundness, for he did not publish it during his lim. It was 
posthumously published by Price, who seems to have felt no doubt 
of its soundness. It and its applications must have made great 
headway during the next 20 years, for Laplace takes for granted 
in a highly generalised form what Bayes tentatively wished to 
postulate in a special case. 

Before ^oing over the formal mathematical relationships in 
terms of which any discussion of the subject must take place, there 
are two preliminary points which require emphasis. First, it is not 
to be lightly supposed that men of the mental calibre of Laplace 
and Qauss, not to mention later writers who have accepted their 
views, could fall into error on a question of prime theoretical 
importance, without an uncommonly good reason. The underlying 
mental cause is not to be confu8<^ with the various secqndaiy 
errors into which one is naturally led in deriving a formal justifi¬ 
cation of a false position, such as for example Laplace s introduction 
into his definition of probability of the unelucidated phrase “equally 
poeetble cases” which, since we must be taken to know what cases 
are equally possible before we know that they are equally probable, 
can only lead to the doctrine, known as the “doctrine of insufficient 
reason,*' that cases are equally probable (to us) unless we have 
reason to think the contrary, and so reduces all probability to a 
subjective judgment. The underlying mental cause is, I suggest, 
not to be found in these philosophical entanglements, but in the 
f fact that we learn by experience that science has its inductive 
j processes, so that it is naturally thought that such inductions, being 
< uncertain, must be expressible in terms of probability. In fact, 
the argument runs somewhat as follows: a number of useful but 
i uncertain judgments can be expressed with exactitude in terms of 
probability; our judgments respecting causes or hypotheses are un- 
^ certain, therefore our rational attitude towards them is expressible 



22.529 


629 Dr Fishery Inverse probability 

in terms of probability. The assumption was almost a necessaiy 
one seeing that no other mathematical apparatus existed for dealing 
with uncertainties. 

The second point is that the development of the subject has 
reduced the original question of the inverse argument in respect 
of probabilities to the position of one of a series of quite analogous 
questions ; the hypothetical value, or parameter of the population 
under discussion, may be a probability, but it may equally be a 
correlation, or a regression, or, in genetical problems, a linkage 
value, or indeed any physical magnitude about which the observa- 
! tions may be expected to supply information. The introduction of 
\ quantitative vanates, having continuous variation in place of simple 
frequencies as the observational basis, makes also a remarkable 
'diflFerence to the kind of inference which can be drawn. 

It will be necessary to summarise some quite obvious properties 
of these continuous frequency distributions. The probability that 
a variate x should have a value in the range x ±^dx is expressed 
as a function of x in the form 

df =« if> (a?) dx. 

The function depends of course on the particular population 
from which the value of x is regarded as a random sample, and 
specifies the distribution in that population. If in the specification 
of the population one or more parameters, 0%, Oz, ... are in> 
troduced, we have 

df^<l>{Xy 01, 0z, 0a, 

where <f> now specifies only the form of the population, the values 
of its parameters being represented by 0i, 0z, 0z, _ 

Knowing the distribution of the variate x, we also know the 
distribution of any function of x, for if 

we may substitute for x and obtain the distribution of ^ in the 
form 

Obviously the form of the distribution has changed; thus, if we 
know the frequency distribution of the time in which a number 
of men run 100 yards, we may derive the distribution of their 
velocities, which will be a different distribution, obtained simply 
by transforming df as a differential element. In particular we must 
notice that the mean of the distribution is not invariant for such 
transformations, thus, if x and f are the means of their respective 
distributions, we shall not in general find that 



22.530 


Dr Fisher^ Inverse prcbabilUy 580 

Similarly, the mode, that ia, the point, if there is one, at which ^ 
has a maximum for variation of x, will not be invariant, for the 
equations 

- 0 . ^-0 

will not normally be satisfied by corresponding values, "^e central 
mey ure which is invariant, at least if is positive fpj^a^^ 

is the mgcKdh; tlTC V&lue which^HiviHes^'the tdfcaTJfequency^^^^ 

^uaT halves. For this point /*= J, and the values of x and 1^ will 
be necessarily in agreement. The same will be true of all other 
points defined by the value of f, so that we may have deciles, 
centiles, etc., dividing the frequency into 10 or 100 equal parts, 
and these will be invariant for any transformation for which 
is always positive. 

All the above applies with no essential change to the more 
general case in which we have several observable variates x,y,z,... 
in place of one. 

i The general statement of the inverse type of argument is as 
{follows; we shall first cloak its fallacy under an hypothesis, and 
then examine it as an undisguised assumption. 

Suppose that we know that the population from which our 
observations were dravm had itself been drawn at random from a 
super-population of known specification; that is, suppose that we 
have a priori knowledge that the probabili^ that ••• shall 

lie in any defined infinitesimal range dOxdv^dO^ ... is given by 


da* 


dF = ^ (^1, 0%, ^8, ...) d6\d6^d6^ ..., 

then the probability of the successive events (a) drawing from the 
super-population a population with parameters having the par¬ 
ticular values 6t, dz, ... and (6) drawing from such a population 
the sample values /ci, ... will have a joint probability 

^ (^1. ^t. ^8, •••) dSxddzdSz ... X n {xp, 6x, 6z, 0z^ ...) dxp]. 

p^\ 

If we integrate this over all possible values of ^i, 6z, Oz* ... ai^d 
divide the original expression by the integral we shall then have 
a perfectly definite value for the probability (in view of the observed 
sample and of our a priori knowledge) that Ox, ^8» ... shall lie 
in any assigned limits. 

This is not inverse probability strictly speaking, but a perfectly 
direct argument, which gives us the frequency distribution of the 
population parameters 0, from which we may, if we like, calculate 
their means, modes, medians or whatever else might be of use. 



22.531 


531 Dr Fisher^ Inverse probabilUy 

The peculiar feature of the inverse argument proper is to say 
something equivalent to “ We do not knovr the function ^ specify- / 
ing the super-population, but in view of our ignorance of the actual; 
values of 0 we may take to be constant/’ Perhaps we might add ; 
that all values of 0 being equally possible tHeiiTprobBbitTtTCS litts 
by definition equal; but however we might disguise it, the choice 
of this particular d priori distribution for the ^’s is just as arbitrary 
as any other could be. If we were, for example, to replace our 
by an equal number of functions of them, 0%, 0z, ... all 

objective statements could be translated from the one notation to 
the other, but the simple assumption 0 %, 0s, ...) = constant 

may translate into a most complicated frequency function for 

O3' . 

If, then, we follow writers like Boole, Venn, and Chrystal in 
rejecting the inverse argument as devoid of foundation and 
incapable even of consistent application, how are we to avoid the 
staggering falsity of saying that however extensive our knowledge 
of the vfiilues of a: may be, yet we know nothing and can know 
nothing about the values of 0? Inverse probability has, I believe, 
survived so long in spite of its unsatisfactory basis, because its 
critics have until recent times put forwani nothing to repine it as 
a rational theory of learning by experience. 

The first point to be made belongs to the theory of statistical 
estiralitioh ; it has nothing to do with inverse probability, save for 
the historical accident that it was developed bj'^ Gauss in terms of 
that theory. 

If we make the assumption that ‘'P (^ 1 , 0%, 0s, ...) = constant, 
and if then we ignore everything about the inverse probability 
distribution so obtained except its mode or point at which the 
ordinate is greatest, we have to maximise 

n (<^ {Xp, 01 , 02 , 0s, ...)} 

■ 1 

for variations of 0i, 0%, ^ 9 , ••• ; and the result of this process will 
be the same whether we use the parameters 0i, 0%, 0s, ... or any 
functions of them, 0i, 02,0s , • Two wholly arbitrary elements in 

this process have in fact cancelled each other out, the non-invariant 
process of taking the mode, and the arbitrary assumption that 
18 ~ constant. The choice of the mode is thinly disguised as that of 
“Che most probable value,” whereas had the inverse probability 
distribution any objective reality at all we should certainly, at least 
for a single parameter, have preferred to take the mean or the 
median value. In fact neither of these two processes has a logical 
justification, but each is necessary to eliminate the errors introduced 
by the other. 



22.532 


Dr Fiaher^ Inverse prchabUily 582 

The process of maximising 11 (^) or 4Sf(log<^) is a method of 
estimation known as the “method of maximum likelihood”; it has 
in fact no logical connection with inverse probability at all. The 
facts that it has been accidentally associated with inverse prob¬ 
ability, and that when it is examined objectively in respect of the 
properties in random sampling of the estimates to which it gpves 
rise, it has shown itself to be of supreme value, are perhaps the 
sole remaining reasons why that theory is still treated with respect. 
Tim function of the 6 *e maximised is not however a probability and 
d^^not ob^Y the laws of probability; it involves no di^erentji^ 
element •••; it does none the less afford a rational basis 

for'prefemhg some values of or combination of values of the 5*s, 
to'othefs. It is, just as much as a probability, a numerical ihecidure 
or rational belief, and for that reason is called the likelihood of 
^ 1 * ^ 1 * ••• having given values, to distinguish it from the prob¬ 
ability that ... lie within assigned limits, since in common 

speech both terms are loosely used to cover both types of logical 
situation. 

If A and B are mutually exclusive possibilities the probability 
of **A or B** is the sum of the probabilities of A and of B^ but the 
likelihood of d. or ^ means no more than “the stature of Jackson 
or Johnson”; yoii do not know what it is until you know which is 
meant. I stress this because in spite of the emphasis that 
I have always laid upon the difference between probability and 
likelihood there is still a tendency to treat likelihood as though it 
were a sort of probability. 

The first result is thus that there are two different measures 
/of ratibhal belief apprc^riate to different cases. Kno wing tile 

our incomplete IcnowTedj^’ of, or 
I expectation of, the sample in terms of probability; knowing the 
I sample we can express our incomplete knowledge of the population 
jin terms of likelinood. We can state the relative likelihood that 
I an unknown correlation is + 0*6, but not the probability that it lies 
[in the range *695—*606. 

There are, however, certain cases in which statements in terms 
of probability can be made with respect to the parameters of the 
population. One illustration may be given before considering in 
what ways its logical content differs from the corresponding state¬ 
ment of a probability inferred from known a priori probabilities. 
In many cases the random sampling distribution of a statistic, T, 
calculable directly from the observations, is expressible solely in 
terms of a single parameter, of which T is the estimate found by 
the method of maximum likelihood. If T is a statistic of continuous 
variation, and P the probability that T should be less than any 
specified value, we have then a relation of the form 

P = FiT, 0). 



22.533 


533 Dr Fisher, Inverse probability 

If now we give to P any particular value such as '95, we have 
a relationship between the statistic T and the parameter such 
that T is the 95 per cent, value corresponding to a given 0, and this 
relationship implies the perfectly objective fact that in 5 per cent, 
of samples T will exceed the 95 per cent, value corresponding to 
the actual value of 0 in the population from which it is drawn. To 
any value of T there will moreover be usually a particular value 
of 0 to which it bears this relationship ; w^c may call this the 
“fiducial 5 per cent, value of 0" corresponding to a given T. If, 
as usually if not always happens, T increases with 0 for all possible 
values, we may express the relationship by saying that the true 
value of 0 will be Ie.ss than the fiducial 5 per cent, value corre¬ 
sponding to the observed value of T in exactly 5 trials in 100. By 
constructing a table of corresponding values, we may know as soon 
as T is calculated wliat is the fiducial 5 per cent, value of 0, and 
that the true value of 0 will be le.ss than this value in just 5 per 
cent, of trials. This then is a definite probability statement about 
the unknown parameter 0, which is true irrespective of any as¬ 
sumption as to its a priori distribution. 


Fiducial 

fi'/o 

— 

95 

Fiducial 

5 7o 

7o 

Fiducial 

7o 

p 

r 

P 

r 

p 

r 

- *995055 

- -oo-ioos 

- •992H32 
-•991007 

- *989027 

- *968.551 

- *961623 

- *953179 

- *9 42891 
*930375 

- *761.594 

- *716298 
- *664037 
- *604368 

- *537050 

+ *14.5340 
+ *270475 
+ *388574 
+ *496089 
+ *.590725 

+ •761.594 
+ *8CX)499 
+ *833655 
+ *861723 
+ •886.3,52 

+ *989816 
+ ■991770 
+ *993336 
+ *994693 
+ *995608 

- •986014 

- *98.3675 

- *98(3096 

- *976743 

- *970452 

-- •915151 

- *896661 

- *874240 

- *847110 
-*814372 

- *462117 
- *379949 

- *291313 

- *197.375 
- *099668 

+ *671557 
+ *738849 
+ *793711 
+ •837715 
+ *872.5JK) 

+ *905148 
+ *921669 
+ *9.35409 
+ *946806 
+ *956237 

+ •996427 
+ *997091 
+ *997628 
+ *998066 
+ *998421 

- *964028 

- *9562.37 

— *946806 

— *93.5409 
-•921669 

- *77.5019 
-•727916 

- *671918 
-•60.5881 

*528824 

0 

+ *099668 
+ *197375 
+ *291313 
+ *379949 

+ •900000 
+ •9214.32 
+ •9.38146 
+ •951174 
+ *961338 

+ *964028 
+ *970 452 
+ -97.5743 
+ *980096 
+ *983675 

+ *998711 
+ *998646 
+ •9991.39 
+ -999296 
+ *999424 

-•905148 

- *886.352 i 
-*861723 

- *833655 

- *800499 

-*440127 

-*.339761 

- *228.562 

- *108446 
-h *017.528 

+ *462117 
+ *.5370.50 
+ *604368 
+ *664037 
+ *716298 

+ *969286 
+ *975519 
+ *980424 
+ *984298 
+ •987.371 

+ •986614 
+ *989027 
+ *991007 
+ *992632 
+ *993963 

+ *999629 
+ •999615 
+ *999686 
+ *999742 
+ *999789 

- *761694 

+ *14.5340 

+ •761.594 

+ *989816 

+ *995056 

+ *999827 






22.534 


Dr Fisher^ Inverse prcbabilUy 534 

For example, if r is a correlation derived from only four pairs of 
observations, and p is the correlation in the population from 'which 
the sample was drawn, the relation between p and the 95 per cent, 
value of r is given in the following table, which has been calculated, 
hnm the distribution formula I gave in 1915, by Miss F. E. Allan. 
From the table we can read off ^e 95 per cent, r for any given p, 
or equally the fiducial 5 per cent, o for any given r. Thus if a 
value r = *99 were obtained from tne san^le, we should have a 
fiducial 5 per cent, p equal to about *765. The value of p can then 
only be less than *765 in the event that r has exceeded its 95 per 
cent, point, an event which is known to occur just once in 20 trials. 
In this sense p has a probability of just 1 in 20 of being less than 
*765. In the same way, of course, any other percentile in the 
fiducial distribution of p could be found or, generally, the fiducial 
distribution of a parameter 6 for a given statistic T may be ex¬ 
pressed as 

df - ^^F{T,0)d0, 

while the distribution of the statistic for a given value of the 
parameter is 

d/=^^F(T,0)dT. 

I imagine that this type of argument, which supplies definite 
information as to the probability of causes, has been overlooked by 
the earlier writers on probability, b|^ause it i’s only applicable to 
statistics of continuous distribution, and not to the cases in regard 
to which the abstract arguments of probability theory were gene¬ 
rally developed, in which the objects of observation were classified 
and counted rather than measured, and in which therefore all 
statistics have discontinuous distributions. Now that a number of 
problems of distribution have been solved, for statistics having 
continuous distribution, arguments of this type force themselves on 
our attention; and I have recently receivea from the American 
statistician, Dr M. Ezekiel, graphs giving to a good approximation 
the fiducial 5 per cent, points of simple and multiple correlations 
for a wide range of cases. It is therefore important to realise 
exactly what such a probability statement, bearing a strong super¬ 
ficial resemblance to an inverse probability statement, really means. 

I The fiducial frequency distribution will in general be difierent 
I numerically from the inverse probability distribution obtained from 
i any particular hypothesis as to a priori probability. Since such an 
hypothesis may be true, it is obvious that the two distributions 
must differ not only numerically, but in their logical meaning. It 
would be perfectly possible, for example, to find an a priori 




22.535 


536 Dr Fieher^ Irwerae probability 

fr^uency distribution for p such that the inverse probability that 
p is less than *766 when r « *99 is not 6 but 10 in 100. In concrete 
terms of frequency this would mean that if wo repeatedly selected 
a population at random, and from each population selected a sample 
of four pairs of observations, and rejected all cases in which the 
correlation as estimated from the sample (r) was not exactly *99, 
then of the remaining cases 10 p>er cent, would have values of p 
less than *765. Whereas apart from any sampling for p, we know 
that if we take a number of samples of 4, from the same or from 
different populations, and for each calculate the fiducial n per ceut. 
value for p, then in 5 per cent, of cases the true value of p will be 
less than the value we have found. There is- thus no contradiction 
between the two statements. The fiducial probability is more 
general and, I think, more useful in practice, for in practice our 
samples will all ^ive different values, and therefore both different 
fiducial distributions and diflTerent inverse probability distributions. 
Whereas, however, the fiducial values are expected to be different 
in every case, and our probability statements are relative to such 
variability, the inverse probability statement is absolute in form 
and really means some^ing different for each different sample, 
unless the observed statistic actually happens to be exactly the 
same. 



23.xxva 


23 

THE SAMPLING ERROR OF ESTIMATED 
DEVIATES, TOGETHER WITH OTHER IL¬ 
LUSTRATIONS OF THE PROPERTIES AND 
APPLICATIONS OF THE INTEGRALS AND 
DERIVATIVES OF THE NORMAL ERROR 
FUNCTION 


AUTHOR’S NOTE 

This section of the introduction to Volume I of the series of mathe¬ 
matical tables published by the British Association for the Advance¬ 
ment of Scienc;e is reprinted principally for the sake of the treatment 
in Application 1 of the sampling error of estimated deviation, which 
had become inaccessible. 

The example is remarkable in that the quantities a and a are each 
defined as a function of the other, having, when the other is known, 
the property that, while being given functions of the statistics and 
the parameters, their distributions are independent of both parame¬ 
ters. The solution may be used directly to find the probability that 
the proportion of defective parts in a consignment of which a sample 
has been tested shall exceed any assigned value; it is therefore of 
practical importance in the critical drafting of specifications. 


* Reprinted from MaihemcUical Tables, Vol. I, pp. xxvi-xxxv, British Associa¬ 
tion for the Advancement of Science, 1931. 



23.xxvi 


Properties of the Functions 


Introductory 


I. The Hermitc polynomial, //„, is defined by the equation 


from this it follows that 


(0 


from which the polynomials may easily be written down in succession, the 
coefficient of the highest power of x being unity. 

Also 


= xH^ + - ^) ( - *«■**’) * V*** = (2) 


Hence is established the recurrence formula 


//„+i - x//„ + « o 


( 3 ) 



23.xxvii 


Writing this in the form 
and substituting 


we find the differential equation 


INTRODUCTION 


+ (« - “ O 

-I) dx» 
n iMf 




an equation by means of which we can define the Hermite function for non¬ 
integral values of n. 

Turning now to the closely related functions defined by 

V 2ir 

we have by definition the relation 

(S) 

and evidently also the recurrence formula, 

G’„+2 - xG„+i + (« + I )G„ = o (6) 

whence it follows that G,. satisfies the differential equation 

The connection of this equation with equation (4) is most clearly shown by 
writing 

Ai = ^-<n+l) 

giving the equation 

<*> 

where (8) is the same equation in ix as (4) is in x. 

2. The Orthogonal Relation of G and H. —Since 


the integral 


-dx^" 




of which the first part vanishes, for G„_i is a polynomial multiplied by If 

m is less than w, the repetition of the process shows that 

which is zero, for II „ is a polynomial of degree m. Since, moreover, 

it follows that if m and n are any unequal integers, the integral is zero. If 
m = nt we have also 

(9) 

3. The Inte^als of the Probability Integral, —^The functions have hitherto 
been defined only when « is a negative integer; for such values equation (5) may 
be written 



23.xxviii 


xxviii MATHEMATICAL TABLES 

tor I must tend to zero as x tends to infinity. We may therefore, commencing 
with /_! = Go, define 

V 27r.'-* 

and define /„ for positive integers by means of equation (lo). We may then 
reduce /„ to a single integral by successive integrations by parts, for 

“ £ K-idt = [(/ + /' (< - 


leading after n stages to 

/n = 

and to the definite integral 






-L [ 

/ 2TrJ-r n ! 

\/2TTJ0 ft ! 


To show that /„ satisfies the differential equation (8), observe that by 
differentiating equation (11) we have 


/„ = , r ( - dt - -L r *(n 

Vz-nJx V t-xJ n\ Vzn'o t .n\ 


whence 


therefore 


dx?' ” V 2ttJ() f . ni 


(/-x)n! 


(f., + )/„ - r . 

\dx^ dx^ ” \/27rJx nl 


as is required. It will be noted that equations (ii) and (12) define the function 
/„ for all values of n greater than - i (using the ordinary generalisation of the 
factorial), and that the function so defined satisfies the differential equation (8) 
and the recurrence formula 

{n + l)J„+^+xr^-J^_^^o (13) 

4. The Value of /„(o).—Substituting zero for x in equation (ii), we have 

K’dt 

V 27tJo n ! 


or, if z — 


/„(o) = 


Va^.w! 


For negative integers also, if /„^G_t„+i,, noting that G„Xo) = — — 

V Ztt 

that Il,n{o) is zero when m is odd, but when m is even reduces to 


or, if w= -(n + i), to 


we should still have 


3) • • • 5-3 

2 *c« ( 1 )( 1 U 

(n + 2 )(n 4 - 4 ). . •('• 3 )(-i) = — 




23 .xxix 


INTRODUCTION xxix 

80 that a function /„ defined for negative values of n so as to have this value when 
X is zero, and to satisfy the differential equation (8), will* reduce at integral values 
to the function 

5. The Solutions of the Differential Equations .—The differential equation (8) 
yields an indicial equation /(/~i)=o, the two roots corresponding to odd and 
even functions which satisfy the equation; if 

yi=‘X + AgX^ + A^x^+ • • • 

is the odd solution, then substituting in the equation we find 
2r(2r + +(2r-i - n)A^r-\ “ © 

or 

y ^ I ” ~ I ~ 0(” - 3) ^R I . . . 


3! 


5! 


a series which is absolutely convergent for all values of n and and reduces to 
a polynomial if n is an odd positive integer. Similarly the even solution is 
seen to be 

n „ «(n -2) . 

Va = I + —,x^ + -i—j—iof* -!-••• 


which also is absolutely convergent, and reduces to a polynomial for even positive 
integral values of n. Since the function /„ has at jc — o a known value and a known 
differential coefficient, we may express it in terms of and in the form 


2-J<n+2) 


(IS) 


which shall define for all values of n. We may note that 

and this \yill be a polynomial in x when n is an even positive integer, while 

-»<n 1) 




C-ii) 


yx 

I 


and this will be a polynomial in x when n is an odd positive integer. 

Reversing the order of the terms in these polynomials we have, when n is 

even 




a;" I 1 

~nl^2 (n -2)1^2 .4 (« - 4)! ^ 


(16) 


which is also the form for /„( - x) - /«(jc), when n is odd. 

The polynomials represented by (16) correspond closely to the Hermite 
polynomials, and may be written Hf^-=rn\. Then it is easy to verify that 








and that 



23.XXX 

XXX MATHEMATICAL TABLES 

and so to derive the relations 

- xH„* - nH„_* = o 

It is of more interest that, defining the Hermite function //„ as 

substituting in equation (15) and identifying the odd and even parts with the 
corresponding solutions of tlie differential equation for //„, we have 


and 


and 


V ! «; 1 / ! *;! 






which reduces to the polynomial form for positive integers. 


Applications 

I. The Sampling Error of Estimated Deviates .—If the population sampled is 
specified by 

aV 277 

then from a sample we can calculate the estimates 

x=- .S{x) “ S{x - x)^ 

n ' n 

and the simultaneous distribution of x and 5 is known to be 


df-T— 






^ ^-TT rr 


(17) 


If now T is the true deviation of any point in terms of the true standard 
deviation a, and t is the apparent deviation in terms of s^ 

x+ - m-i- crcc 

and for each value of flt^<iwill have a determinate distribution. 

Changing the variates from x and 5 to / and Sy we have 

dx= - Ada. 

and 

"x — m—crat ^ A.a. 

SO that the index of the exponential term is 




23.xxxi 


INTRODUCTION xxxi. 

and, writing u for sja, we obtain 

df^ - - -(i8) 

Now u varies in value from o to oo, and the integral between these limits of 
the frequency factor involving u may be written 

(« -1) l{n{i 

giving the distribution of cl in the form 


Since 






(19) 


/«-i(o) 


2-*(n+l) 


when T = o the distribution reduces to 

(—)• 




which is “Student’s’* distribution. 

In considering the approximation to normality of the general distribution 
(19), we shall require an approximate value of the integral function with 

negative argument proportional to Vw, for the case when n is large. For this 
purpose we shall use the formula, wherein x— - 2Vn sinh 

« , . V^27r« 


n\ V 2 cosh <f> 
or, neglecting terms which do not contain 

Differentiating with respect to Xy since 

^ _ I d 

dx 2 V n cosh S dd> 


( 20 ) 


and since 


^(log A-i)= - 

dx tV n 

~JT+¥W* 


the approximate equation for the mode is 


nt htH 

a 7 


which is satisfied by 
for at this point 


I +/*"(!+/*)*"(! +/»)»/*' 
t^T 

2 sinh 

Vl+T* 


(«) 



xxxii 

whence 


MATHEMATICAL TABLES 


23 .xxxii 


z cosh <!> = 


and 


Vl+T* 


Using this relation, 


a/i +T* 


^ Oog I)= -Vn.e* 

^ T\ -e-* 

dx^ ( ^ 2 cosh <l> 

£ 3 (iog/)=: 


-V« 
VrTrS 
-1 

2 + T® 


and 


_I_ - 2(1 

4 \/n cosh® <f> 'V «(2 + T*)® 

- ^ iXaa n = ~3sinh<^ _ -6 t®(i 

^ ^ 8/1 cosh®~ «(2 +t2)® 


Bx _- V/IT ^ 2 ^nTt 

Bi “ (I + t^fT^ Bt^ "" (i + 

5 ®jc 3 \/»t(i-4/®) i5v/wt/(4<® - 3) 

c>/® "" (F+y®)’/* Bt^" (!+«*)»/* 

consequently, at the mode, 

c)® ,, _ 3nT® 

^(log/^-- - 


( 22 ) 


(23) 


(l +t2)®(2+T*) 

(log /) = - : -'J(Lz 4L^) +_9«'r"__ _ 

^*«g n (I + (, ,.2)4(2 + T®) ^ ( 1 + t2)®(2 + T®)® 


(iog /) - 


6«t* 


- I smt2(4t® - 3) nT®(7ST® - 12) 

(1 + F)® (14 T®)®(2 + T®) 0 y tY( 2 + T®)® (l + T*)*(2 + T®)® 


36«t* 


(24) 


The second derivative of the ordinate at the mode is therefore 

w( 1 - WT®( i - 3<®) _ 3 ? it® 


■( 14 -/®)®"- (1+/®^® (iVr®)® (HT®)®(2-hT®) 2+T® 

for large samples we infer that the variance of t is given by 

«F(0 = i+It® 

The^hird derivativis 

2/i/( 3 - < ®) _ £2«T®f( V - <®) _ 3?it(i - 4T*) 9MT®^_ 2WT® ^ 2«t(I2 + 5T®) 

"(i +y^® (14 (1 + T®)* (I + T ®)^(2 + T®) (I + r ®)®(2 + T®)® (2 + rf 

Equating this to yi‘T,®, we find 


(25) 


t(i2 + 5T®) 


(26) 


Finally, the fourth derivative is 

6 n( 1 - 6 /® + <*) I 2 «t®( I - io^® + 5/*) . 15 WT ®(3 - 41^) ^ 3 ” t ®(4 - 25T® ) 

(1+/®)® (l"+Z®)® (l+T®)® (l+T®)®(2+T®) 


36«t 


6nT® 


6« 


(I + T®)®(2 + T®)® (1 + T*)®(2 4- T®)® 


" ( 2 + T« y » ^ 3 ^ - 32t» - 40T* - 9t') 

and on equating this to y^af, we have 


3 32 - 32 t ®- 40 t *- 9 t ® 

(2+r®)® 


(27) 





23.xxxiii 


INTRODUCTION 


xxxiii 


The value of y,, measuring asymmetry of the third moment, ranges from 
zero when t=o, to sjVzn when t is infinite, thus showing a very moderate 
asymmetiw. The value of ya, which measures the departure of the fourth 
moment from its normal value, varies from 6/« when r = o, to — 27/2W when 
T is infinite. The nature of the approximations found above, appropriate for 
large samples, is illustrated by the fact that when t = o, the exact value of ya for 
“Student^s” distribution is 6/(n — 5). 


2. Truncated Normal Distribution. —If, in a normal distribution with mean 
m and standard deviation o-, all record is omitted of individuals below a given 
value, which we may write w + o-^, the frequency of the truncated distribution 
in the range dx will be 


trV 27 r 


—j=e-^^dt = — j== - —e 

V 27r aV 27r/„(^) 


(28) 


provided that x exceeds m + 

The moments of the distribution about its terminus are easily expressed 
functions, for 


T 

)L,. 


_(£r”’y I 

t-7=-: - f x^e dx ~ o'*‘r 

a >/ 27r/o(^)-'»n I <rt A>(^) 


in I 


(29) 


It is noteworthy that in the estimation of the unknown parameters ^ and a, 
the method of moments gives in this case the same solution as the method of 
maximum likelihood; and is therefore,’ in this case, efficient. 

For, apart from a constant, taking the origin at the terminus. 




- « log a - 


2cr® 


n log 4 


SO that the equations of maximum likelihood are 


but 




SO that equation (30 bis) may be written 


(30) 

(30 bis) 


which is the equation obtained by equating the first moment of the sample to 
that of the population. Substituting, now, in (30) the value found for x^ we have 


but 

hence 


a/.-/,-!/. 

S(*«)-zii<4 (3J) 

■*0 


which is the equation obtained by equating the second moment of the sample 
to that of the population. 

Eliminating a between the two equations, the maximum likelihood solution 
for f must satisfy the equation 

2/0/, 

S\x) " V 


(3a) 



23 .xxxiv 


xxxiv MATHEMATICAL TABLES 

and the practical solution consists in entering the observed value of the quantity 
on the left in a table showing, for values of the quantity on the right. 

The precision is determined by the second differential coefficients 


an V aH V 


Using 

and 

it is clear that 
and 

If, therefore, 

A = 


then 


B^L nx n /j 
a'^ a 

gp «\I+ / 

-/_,n 

V“ - V - A(A + ^/o) - 2 AA - A" 


(33) 


«/ /., \ n L 

~a~j„ 



n /, / /g A”\ 

-al[ 


-X' 


= -J/Aa'"* - M2' Ml'" + - 2a2Mi'2} 


K(a) 




A wlMa'Ma + cr'‘(2M2 - M2')} 
A] I <^"(M2' 


(34) 

(34 bis) 

(35) 


A «!M2'M2 + ""(2^2 - M2')} 

and the correlation between the sampling errors of a and ^ will be 

I 

V' M2(/i2' + 

3. Modified Poisson Series .—In the simple Poisson series the frequency with 
which the variate takes the integral value x is 

“1^ 

If, now, w is a variate with distribution specified by 

: dm 




_(’")VS'-^ 

ation as estimated 


(36) 


as, for example, is the standard deviation as estimated from a sample, the frequency 
of the value x will be 

( 37 ) 


and this is equal to 


(38) 




28.XXXV 


INTRODUCTION xxxv 

putting x—Of I, 2 , . . .we have the modified series, in which evidently 

(1^), ( 3 ,) 

A more general distribution of m which develops a modified series in I 
functions is 

d/= - ^dm 

or«+'/,(a)^! \/air 

a being a real parameter which may be either positive or negative. Then the 
frequency of the value x is 


cr'=’+^/,Xtf)(7! a:! V27ryo 


which is 


^air+|v< + Jf) I 

77(fl)7! * xt 


a^I^+^(a-ha) 


giving the interesting identity 


-VWa + a) = ^! 



24.284a 


24 

TWO NEW PROPERTIES OF MATHEMATICAL 
LIKELIHOOD • 


AUTHOR’S NOTE 

From a logician’s point of view one of the most surprising results ob¬ 
tained by the theory of estimation is that not only the mathematical 
form of the inferences which can be rigor .msly drawn concerning the 
unknown parameters of the populations sampled, from the frequen¬ 
cies observed in a random sample, depends on the particular mathe¬ 
matical specification of this population, but that the logical nature 
of these inferences depends on this also. 

The present paper is designed to illustrate the fact that, if one set 
of functional conditions is satisfied, there will exist sufficient statis¬ 
tics, while, if a second and distinct limitation is imposed on the prob¬ 
lem, estimates may be made exhaustive, and the small sample prob¬ 
lem solved with exactitude, by means of ancillary statistics. 

Fxamples of each class arc treated in detail, so that the reader may 
grasp clearly the peculiarities of the likelihood function which each 
implies. Both classes are of rather common occurrence, but beyond 
them it would appear that it is not possible to derive exact state¬ 
ments of fiducial probability from the primary inference supplied by 
the relative likelihood of all possible combinations of parametric 
values. 

The particular contents of the paper are briefly sketched in the 
summary, page 24 . 306 . 


* Reprinted from Proceedings of the Royal Society, Series A, Vol. 144, pp. 285— 
307, 1934. 



24.285 


Two New Properties of Mathematical Likelihood, 

By R. A. Fisher, F.R.S. 


1 . Introductory, 

To Thomas Bayes* must be given the credit of broaching tlie problem of 
using the concepts of mathematical probability in discussing probleiiLs of 
inductive inference, in which we argue from the particular to the general ; 
or, in statistical phraselogy, argue from the sample to the population, from 
which, ex hypothesis the sample was drawn. Bayes put forward, with con¬ 
siderable caution, a method by which such problems could be reduced to the 
form of problems of probability. His method of doing this depended essentially 
on postulating a jnriori knowledge, not of the particular population of which our 
observations form a sample, but of an imaginary population of populations 
from which this population was regarded as having been drawn at random. 
Cleai'ly, if we have possession of such a priori knowledge, our problem is not 
properly an inductive one at all, for the population under discussion is then 
regarded merely as a particular case of a general type, of which we already 
possess exact knowledge, and are therefore in a position to draw exact deductive 
inferences. 

To the merit of broaching a fundamentally important problem, Bayes 
added that of perceiving, much more clearly than some of his followers have 
done, the logical weakness of the form of solution he put forward. Indeed we 



286 


R. A. Fisher. 


24.286 


are told that it was his doubts respecting the validity of the postulate needed 
for establishing the method of inverse probability that led to his withholding 
his entire treatise from publication. Actually it was not published until after 
his death. 

If a sample of n independent observations each of which may be classified 
unambiguously in two alternative classes as “ successes ” and “ failures 
be drawn from a population containing a relative frequency x of successes, 
then the probability that there shall be a successes in our samples is, as was first 
shown by Bernoulli, 


a ! 


n I 

(w — a) ! 




( 1 ) 


This is an inference, drawn from the general to the particular, and expressible 
in terms of probability. We are given the parameter x, which characterizes 
the population of events of which our observations form a sample, and from it 
can infer the probability of occurrence of samples of any particular kind. 

If, however, we had a ‘priori knowledge of the probability, / (x) <ix, that x 
should lie in any specified range dx, or if, in other words, we knew that our 
population had been chosen at random from the population of populations 
having various values of x, but in which the distribution of the variate x is 
specified by the frequency element/ (x) dx of known form, then we might argue 
that the probability of first drawing a population in the range dx, and then 
drawing from it a sample of n having a successes, must be 

a ' ! ' (n”- a)! ^ : (2) 

since this sequence of events has occurred for some value of x, the expression 
(2) must be proportional to the probability, subsequent to the observation of 
the sample, that x lies in the range dx. The postulate which Bayes considered 
was that / (x), the frequency density in the hypothetical population of popula¬ 
tions, could be assumed a ‘priori to be equal to unity. 

As an axiom this supposition of Bayes fails, since the truth of an axiom 
should be manifest to all who clearly apprehend its meaning, and to many 
writers, including, it would seem, Bayes himself, the truth of the supposed 
axiom has not been apparent. It has, however, been frequently pointed out 
that, even if our assumed form for / (x) dx be somewhat inaccurate, our con¬ 
clusions, if based on a considerable sample of observations, will not greatly be 
affected ; and, indeed, subject to certain restrictions as to the true form of 
f (x) dx, it may be shown that our errors from this cause will tend to zero as the 



24.287 

Two New Properties of MtUkematical lAkelihood, 287 

sample of observations is increased indefinitely. The conclusions drawn will 
depend more and more entirely on the facts observed, and less and loss upon 
the supposed knowledge a priori introduced into the argument. This property 
of increasingly large samples has been sometimes put forward as a reason for 
accepting the postulate of knowledge a priori. It appears, however, more 
natural to infer from it that it should be possible to draw valid conclusions from 
the data alone, and without a priori assumptions. If the justification for any 
particular form of / (x) is merely that it makes no difference whether the form 
is right or wrong, we may well ask what the expression is doing in our reasoning 
at all, and whether, if it were altogether omitted, we could not without its aid 
draw whatever inferences may, with validity, be inferred from the data. In 
particular we may question whether the whole difficulty has not arisen in an 
attempt to express in terms of the single concept of mathematical probability, 
a form of reasoning which requires for its exact statement different though 
equally well-defined concepts. 

If, then, wo disclaim knowledge a priori^ or prefer to avoid introducing such 
knowledge as we possess into the basis of an exact mathematical argument, we 
are left only with the expression 


n t 

a ! (w — a)! 




which, when properly interpreted, must contain the whole of the information 
respecting x which our sample of observations has to give. This is a known 
function of x, for which, in 1922 , I proposed the term “ likelihood,” in view of 
the fact that, with respect to x, it is not a probability, and does not obey the 
laws of probability, while at the same time it bears to the problem of rational 
choice among the possible values of a: a relation similar to that which prob, 
ability bears to the problem of pretlicting events in games of chance. From the 
point of view adopted in the theory of estimation, it could be shown, in fact, 
that the value of a?, or of any other parameter, having the greatest likelihood 
possessed certain unique properties, in wffiich such an estimate is imequivocally 
superior to all other possible estimates. Whereas, however, in relation to 
psychological judgment, likelihood has some resemblance to probability, the 
two concepts are wholly distinct, in that probability is appropriate to a class of 
cases in which uncertain inferences are possible from the general to the par¬ 
ticular, while likelihood is appropriate to the class of cases arising in the problem 
of estimation, where we can draw inferences, subject to a different kind of 
uncertainty, from the particular to the general. 



24.288 


288 R. A. Fisher. 

The primary properties of likelihood in relation to the theory of estimation have 
been previously demonstrated.* In the following sections I propose to exhibit 
certain further properties arising when the functional properties of the specifica¬ 
tion of the population fulfil certain special, but practically important, con¬ 
ditions. 

2 . The IHstrihution of Sufficient Statistics. 

The essential feature of statistical estimates which satisfy the criterion of 
sufficiency is that they by themselves convey the whole of the information, 
which the sample of observations c;ontains, respecting the value of the para¬ 
meters of which they ar(^ sufficient estimates. This property is manifestly 
true of a statistic T^, if for any other estimate Tg of the same parameter, 0, 
the simultaneous sampling distribution of and Tg for given 0, is such that 
given Tj, the distribution of Tg does not involve 0 ; for if this is so it is obvious 
that once Tj is known, a knowledge of Tg, in addition, is wholly irrelevant; 
and if the property holds for all alternative estimates, the estimate Tj will 
contain the whole of the information which the sample supplies. 

This remarkable property will be possessed when 

where Tj is the likelihood of 0 for a given sample of observations, is the same 
function for all samples yielding the same estimate T^ ; for on integrating the 
expression above with respect to 0 , it appears that log L is the sum of two 
components, one a function only of 0 and Tj. and the other dependent on the 
sample but independent of 0 . If 

/(Tj, Tg, Olr/TidTg 

is the frequency with which samples yield estimates simultaneously in the 
ranges dTj and dTg, it follows that 

/(T„ T„ 0 ) = (T„ 0 ). (T„ TJ ; 

where the first factor involves T, and 0 only, and the second does not involve 
0 . The distribution of Tg given Tj will therefore be 

4 ,^ (T„ TJ dT,^j (T„ TJ rfT, 

* Fisher, ‘ Proc. Tamb. Phil. See..' vol. 22, p. 700 (1926), 



24.289 


Two New Pro'pertiea of Maihematical Likelihood. 289 

the integral being taken over all possible values of Tg* and in this expression 
the parameter 6 is seen not to appear. 

The condition that dL/96 should be constant over the same sets of samples 
for all values of 6, which has been shown to establish the existence of a 
sufficient estimate of 0, thus requires that the likelihood is a function of 6, 
which, apart from a factor dependent on the sample, is of the same form for all 
samples yielding the same estimate T. The sufficiency of sufficient statistics 
may thus be traced to the fact that in such cases the value of T itself alone 
determines the form of the likelihood as a function of G. If a conventional 
value such as unity is given to the maximum likelihood for any sample, the 
likelihood is thus expressible as a function of 6 and T only, if T is the sufficient 
estimate. We shall use this property in obtaining a general form for the 
sampling distribution of sufficient statistics. 

2.1. It will help if wo take an illustrative example of this problem. Let the 
element of frequency in a distribution be given by 

^ dx 

where the variate x can take any real value from 0 to oo , and G is an unknown 
parameter greater than —1. Consider the problem of estimating G from a 
sample of n values of or. 

If L is the likelihood of any possible value of G, 

log L = — n log G ! 4- (log — S 
and this is maximized for variations of G, when G = T, where 

y^(T)=ls(logx). 

ft 

and /"(T) is the first dificrential of the logarithm of the factorial function. 
This is the equation of estimation by the method of maximum likelihood. It 
will be observed that apart from a constant factor the likelihood is expressible 
as a function of G and T only, that is 

L = A exp {—n log G ! -j- n^f (T)} 

BO that T is evidently a sufficient estimate. 

2.2. The sampling distribution of our estimate must evidently be derived 
from that of the mean of the logarithms of the several values of x in the sample. 
Now the mean value of 


^log^r 



24.290 


290 

is 


R. A. Fisher. 


f 1 

J 0! * 


^ + -)! 

n! . 


c * dx —. 


0 ! 


By th« familiar process of expanding the multiple integral in a product of 
single integrals, the mean value over all samples of 


M = 




{0!}» 

and this is the characteristic function of 

^(T) 

from which it4s distribution may be inferred. 

To determine the probability function knowing the characteristic function 
M (tV) we may use the property of the sine integral 


Jo u 2* 

writing kl for u it appears that 

TC Jo t 

when k is positive, and —J when k is negative ; or that 

i f {sin (a; — a)t — sin (x — 6) 0 ~ 

7C Jo t 

when' 6 > a, is unity when 6 >■ x ]> a, and zero when x is loss than a or exceeds 

b. 

Consequently the Stieltjes integral 

f b I ,'ao rcc 

ft/ (•' ) - - {sin (•« ~ a) t — sill (x — b) l) df (x) ; 

a «> 


writing 


gives us 


sin (x — o) t i (e“ — e-*"*"-’) 

z% 


I"d/(i) = — r * M {it) — e'“‘ M (— it) — M (it) + «*»' M (— it)y 

j a 2?! Jo W 



24.291 


Tu>o New Properties of Moithematicai Likdihoodn 291 

where / (x) is the probability function of the variate x, and M is its characteiv 
istic function. 

We may note that M (it) and M (— it) must be conjugate quantities, which 
may be written R ± il, then 

— 6“®* M (it) -4- M (— it) = 2tR sin bt — 2il cos bt 

so that the integral takes the real form 

if — {R (sin bt — sin a<) — I (cos bt — cos at)}, 
n Jo t 

Where the probability function is differentiable so that 

df(x)=^ydx 

then 

d/(x) = t/ dx = p- r {e-*^ M. (it) + e‘»* M (-•<)} * 

2tc Jo 

= ^ {KcoB(te) + Isin (te)}*. 

7C Jo 

2.4. For the sufficient statistic T, the sampling distribution will therefore 
be given by 

df = r M (*) * ; 

but 

e-«/iT> M ___ 


hence the distribution may be directly inferred from the nature of the likelihood 
function in the form 







h(O^it) 


dt 


where / (T) stands for the second differential coefficient of log (T !). 

Wo may illustrate the use of this formula by deriving the limiting forms for 
extreme values of 6. 

Near the limit 6 — 1 the general expression 

nd(f(^)) r (e-f !\ 

271 ei* 




wliertj u stands for and the integral is taken from 1 — i oo to 1 -|~ t oo , or 
ill an open contour passing counter-clockwise to the right of the singularity, 
« ~ 0. Writing ^ for nuz we have 

n du f 

2ni J 

where the integral does not now involve the variate u, and is evidently 
2nil{n — 1) ! The distribution now appears as 

in which may be substituted for ?/. 

The probability integral in this case is given by the distribution for 2n 
degrees of freedom. Thus if a sample of 10 had been taken, we have 20 degrees 
of freedom, and the 5% values arc at x*== lb*851-'and 31*410.* Putting 
nu = the 5% values of u are 0*5426 and 1*5705, whence those of ^ — Y can 
be obtained, showing that in 9(»% of samples g will lie between y — 0*6110 
and Y + 0*4514. For given g^ therefore, the hducial probability is b% that 
Y exceeds g -| - 0*6110, or falls short of ^ — 0*4514. 

At the upper limit where 0 -► oo we may wito 

T=^2 e=Y*. 

* “ Statistical Methods for Research Workers,” Table III. 



24.293 


Two Now Properties of Mathematical Likelihood, 293 

and since flr — y remains finite for finite values of the probability function 

log I = 

Y 

and tends to zero. The general expression for the distribution, which is 

*««f(/'(T)) r« (9 + il) I" - IT) J, 

27c ei" . ’ 

tends to the limiting form 

df = p e‘*‘ e“^* T-‘»' dt 

or substituting for T and 6 

df = f* «-“‘W dt 

ng ]_« 

= a/— 

V TP 

showing that ^ tends in the limit to be normally distributed about the 
population value y with variance l/4n. The 5% points of the distribution of 
^ are therefore y ± 1'646/\/4 w, and for a given y the 5% points of the 
fiducial distribution of y are g ± 1 *646/^ 

2.6. The interest of this form lies in the possibility of generalizing it for all 
sufficient statistics. For, let the equation of maximum likelihood have a 
solution 

^(T) = A 

where A is a symmetric function of the observations not involving the para¬ 
meter 0. The expression for 0/50 log L must have been of the form 

c{A<i;'(e)-^(e).4-'(e)} 

where the possible factor C, if not a constant, must be a function of the observa¬ 
tions which is expressible as a function of A, if the likelihood is to bo expressible 
as a function of 0 and T only. 

The expression for log L then must be of the form 

CAt|» (6) — C j ^ (9) d<\f (9) + B 

* ** Statistical Methods for Research Workers,** Table I. 



24.294 


294 R. A. Fisher. 

where B is a function of the observations only. That C a sytnnietric function 
of all the observations must be merely the number n in the sample appears 
from the fact that log L is the sum of expressions involving each observation 
singly. Hence 

CA = S (X). B = S (Xi), 

where X, Xj, are functions of the individual observations x. The likelihood 
is now the product 

^ - « J ^ (0) drif (9) ^ ^ j 

and 

^ (4^) ™ g-i<S(X) gHK,(4. + »n 

L (4> + it) 

where Fj (y) is written for J </> d^. 

But the frequ^mey function of the variate X was given by 
g—F, (4>) ^x, ^ dX, 

dX ’ 

hence its characteristic function is 

M(iO 

while that of S (X) is the wth power of this expression, hence the probability 
that S (X) lies between Sg and is 

f df — ^ ~{e- M" (il) — e”**' M" (—*) — 6“ M" (it) + M" (—it)} 

J 2n J , it 

1 p dt f L (4^, Sg) _ , L(j;,s,) 1 

2n], il IL( 4 ^+ 2 V, Sg) it, Hg) L (4'+^, SJ L(^J;—it,Sj)/ 

this being the general expression for the probability of any sufficient statistic 
falling within assigned limits ; and Sg being the limits of the known function 
n<l> (T) of the sufficient estimate T. 

2.6. The property that where a sufficient statistic exists, the likelihood, apart 
from a factor independent of the parameter to be estimated, is a function only 
of the parameter and the sufficient statistic, explains the principal result 
obtained by Neyman and Pearson in discussing the efficacy of tests of signifi¬ 
cance. Neyman and Pearson introduce the notion that any chosen test of 
a hypothesis Hg is more powerful than any other equivalent test, with regard 
to an alternative hypothesis Hj, when it rejects Hg in a set of samples having 



24.295 


Two New Projierties of Matheniatical Likelihood, 295 

an assigned aggregate frequency e when Hq is true, and the greatest possible 
»ggr®g»t© frequency when is true. 

If any group of samples can be found within the region of rejection whose 
probability of occurrence on the hypothesis is Jess than that of any other 
group of samples outside the region, but is not less on the hypothesis then 
the test can evidently be made more powerful by substituting the one group 
for the other. Consequently, for the most powerful test possible the ratio of 
the probabilities of occurrence on the hypothesis Hg to that on the hypothesis 
is less in all samples in the region of rejection than in any sample outside it. 
For samples involving continuous variation the rt'gion of rejection will be 
bounded by contours for which this ratio is coiisttiiit. The regions of rejection 
will then be required in which the likelihood of Hg bears to the likelihood of 
a ratio less than some fixed value defining the contour. 

The test of significance is termed uniformly most powerful with regard to a 
class of alternative hypotheses if this property holds with respect to all of 
them. This evidently requires that the contours defined by the ratio of the 
likelihood of and I£g shall bo the same as those defined by the ratios of the 
likelihood of any two hypotheses in the class. If, therefore, T' is a statistic 
defining these contours, and 62 * variable parameters defining the 

hypothetical populations, the likelihood of any hypothesis must be expressed 
in the form 

L = A/(T', Oi, 02,...) 

where A is a factor independent of the parameters. 

The method of estimation by maximum likelihood, when applied to the 
form above, will yield equations for © 2 , .., etc. 

4>i (T', Oi. 02, 0, 

^2(T', 01, 02, ...) = 0; 

whore 

and the solutions of these will give estimates of 0 j^, © 3 , ..., which we may 
designate Ti, T,. .... in the form 

Ti^^'i (T') 

Ta = 4^2 (T'), etc. 

It is evident, at once, that such a system is only possible when the class of 
hypotheses considered involves only a single parameter 0 , or, what comes to 



24.296 


296 R. A. Fisher. 

the same thing, when all the parameters entering into the specification of the 
population are definite functions of one of their number. In this case, the 
regions defined by the uniformly most powerful test of significance are those 
defined by the estimate of maximum likelihood, T. For the test to be uni¬ 
formly most powerful, moreover, these regions must be independent of 0, 
showing that the statistic must be of the special t 3 rpe distinguished as sufficient. 
Such sufficient statistics have been shown to contain all the information which 
the sample provides relevant to the value of the appropriate parameter 0. 
It is inevitable therefore that if such a statistic exits it should uniquely define 
the contours beat suited to discriminate among hypothe-ses differing only in 
respect of this parameter ; and it is surprising that Neyman and Pearson 
should lay it down as a preliminary consideration that ‘‘ the testing of statistical 
hypotheses cannot be treated as a problem in estimation.’" When tests are 
considered only in relation to sets of hypotheses specified by one or more 
variable parameters, the efficacy of the tests can be treated directly as the 
problem of estimation of these parameters. Regard for what has been estab¬ 
lished in that theory, apart from the light it throws on the results already 
obtained by their own interesting line of approach, should also aid in treating 
the difficulties inherent in cases in which no sufficient statistic exists. 

3. A Second CUms of Parajnelers for which Estimation need Involve no Loss of 

InfomuUion, 

In the case of sufficient statistics the likelihood function is, apart from a 
constant factor, the same for all sets of observations which yield the same 
estimate by the method of maximum likelihood. A second C£tse, of somewhat 
wider practical application, occurs when, although the sets of observations 
which provide the same estimate differ in their likelihood functions, and there¬ 
fore in the nature and quantity of the information they supply, yet when 
samples alike in the information they convey exist for all values of the 
estimate and occur with the same frequency for corresponding values of the 
parameter. 

The nature of the correspondence may be stated as follow.s : If ..., 
stands for a sample of n values of a variate x, the distribution of which is 
conditioned by a parameter, 0, then for any value of 0, there will be a definite 
probability 

P 0) 

of the occurrence of a variate less in value than x. 



24.297 

Two New Properties of Mathematical lAkdihood, 297 

If, therefore, we take any other value of the parameteT, say there will, 
with continuous variates, always exist a series of observational values, y, 
corresponding to the original series x, such that 

P (y. ^)^P 0). 

The samples x and y will, however, only correspond in the sense required for 
our present purpose if corresponding to any possible value, 0, a value, can 
be found so that the relationship above holds for all values of x. If, in fact, the 
equation were solved for y, in the form 

y =/(®. 6 . 4>) 

it is required that / shall be of the form 

f(x, 0, ^)==F(®, O) (3) 

where Q is a function of 6 and independent of the observations, and such 
that for any possible values of 0 and Q there exists a corresponding value of 
Stated symmetrically it is required that some function of x and y can be 
equated to a function of 0 and 

The typical case of such a relationship occurs in parameters of location. 
If the distribution of the variate x involves a parameter 0, such that the 
frequency with which x falls in any element dx of its range is a function of 
(x — 0), then 0 may be called a parameter of location. In such a case the 
functional relationship (3) may be written 

X — y = 0 — 

and is clearly of the form required. 

Let us take an example in which there is no sufficient estimate, and in which 
the loss of information in estimating the unknown parameter even by the 
method of maximum likelihood is considerable. The distribution of a; is a 
double exponential curve, the probability of x falling in the range dx, being 

The logarithm of the likelihood is 

-s|x-e|. 

and this increases when 0 is increased only if more observations are greater 
than those less than 0. The likelihood is therefore maximixed if the number of 
observations is odd, by equating 0 to the median observation ; if the number 



24.298 


298 R. A. Fisher. 

of observatious is even the likelihood is constant when 0 has any value between 
the two central observations. 

For a sample of an odd number, n — 2a- -f- 1 of observations, the sampling 
distribution of the median is determinate, and the loss of information, if we 
use the median as an estimate, unsupplemented by the ancillary information 
which the sample contains, may be calculated. For, if the central observation 
lies at a distance u from the centre of the distribution, u being supposed positive, 
then the s highest values observed must each have fallen in a region comprising 
only 

of the total frequency, while the a* lowest values have fallen in the remaining 
region comprising 

1 -- 


of the total. Finally, the probability of the median itself falling in the range 
du is 

so that compounding the independent probabilities into which the event has 
been analysed we have 

as the probability of the median having a positive sampling error, u. As « is 
increased without limit we may write 

tiV n — t, 

and the distribution tends to the limit 


c-**' dt, 

V2n 


The amount of information derivable from a large sample of n thus tends to 
equality with as the size of the sample is increased. Since the information 
supplied by the independent observations is additive,* each must supply one 
unit, and a sample of 25 -f- 1 observations must contain 25 1 units of informa¬ 

tion. The quantity elicited by using the median, t.e., by replacing the 25 -f- 1 
observations fronj the distribution 


* Fisher, ‘ Proc. Camb. Phil. Soc.’ vol. 22, p. 700 (1925). 



34.299 


Two New Properties of MathemcUical Likelihood. 
by a single observation from the distribution 

(2 - ,lH, 

may be calculated from the mean value of 

[dh 

(»-2_e-r“-W + V- 


299 


or of 


When s exceeds unity the average values may be evaluated from the con¬ 
sideration that 


poo 

Jo (“ 


(2^ 4- 1) t 


1 )! (s-h 1)! 22*^1 


g-(. n)u (2 _ e"“ du 


represents the probability that at least s -j- 2 observations have positive, and 
only s — 1 observations negative, deviations, and may therefore be equated to 


1 (2^ -f 1)! 

* s ! 4- 1)! 2* 


s ! (s -f- 1)! * 

The mean value of e“*““®7(2 — e“is therefore found to be 




(2s + 1) t 


si (s-hl) 


M.. 

! 2 **/ 


Similarly, the meau value of is 


(28 + 1) 1 


(« + !)(» + 2) / , (2s 4-1)! 

«(*—!) V « I s + 1 ! 2** (s - 1) ! (s + 2) 


2)! 2»)' 


The amount of information provided by the median of 2^ 4“ 1 observations 
is therefore 


(* + l)«-2s(. + l) 

S S [S ly 


(2^ 4- 1) I 


4.21.-4-11* (2e -f 1)! » (» + 1) (* + 2) __ 

^ ^ 4- 1)! 2“ * — 1 ■» ! (* 4 1) I 2** 

» (» 4-1) (» 4- 2) (2» 4-1) I 

, - 1 (* - 1)! (. 4- 2) 1 2" 



300 


R. A. Fisher. 


24.300 


or 

(8 4 - 1 ) (2s + 1 ) ( _ (28 )! 

(«-l) I (s\)^2^-^r 

In the special case, s — 1, the general method fails, and a direct integration 
yields the value 

12 (log2~i). 


which is the limit to which the general expression tends as s -► 1. 

The median is an efficient estimate in the sense of the theory of large samples, 
for the ratio of the amount of information supplied to the total available tends 
to unity as the sample is increased. Nevertheless, the absolute amount lost 
increases without limit. As s increases, this amount lost, 


2s+ 1 

may be replaced by 


__ (8 -f 1) (2s -f 1) f 1 _ 2s! ] 

8 — 1 \ (s !)* 2**-i J ^ 

2 (2s + X) / s-f-l _ j \ 

S — 1 V-y/(s -j- J) 


approximately, or by 4 — 1). Thus with s = 314, for a sample of 629 

observations, the loss of information is near to 36 units, or the value of about 
36 observations. 

It is a matter of no great practical urgency, but of some theoretical impor¬ 
tance, to consider the process of interpretation by which this loss can be 
recovered. Evidently, the simple and convenient method of relying on a 
single estimate will have to be abandoned. The loss of information has been 
traced to the fact that samples yielding the same estimate will have likelihood 
functions of dilferent forms, and will therefore supply different amounts of 
information. When these functions are differentiable successive portions of 
the loss may be recovered by using as ancillary statistics, in addition to the 
maximum likelihood estimate, the second and higher differential coefficients 
at the maximum. In general we can only hope to recover the total loss, by 
taking into accoimt the entire course of the likelihood function. 

In our particular problem the curve of likelihood is a succession of exponential 
arcs, having n discontinuities at the values of the n observations of the sample, 
tlie exponent changing by —2, as eacli observation is passed in a positive 
direction. For the same value of our estimate, the median observation, this 
function will have very different forms according to the length of the intervals 
which separate the median from its successive neighbours. Any samples, 
however, in which these n — 1 intervals are the same will have the same 



24.301 


Ttvo New Properties of Mathematical Likelihood. 301 


likelihood function. More explicitly the likelihood of the parameter having 
a value ^ as judged from the series of observations •••> t/n ^i^l equal to 
the likdiliood of its value being 0 as judged from the series x^t if 


a; — y = e — ^ 


for each pair of observations in the pair of samples. 

We may specify the configuration of a sample by a series of positive non- 
decreasing numbers representing the positive deviations from the 

median of the s largest observations, and a second series of positive non¬ 
decreasing numbers a\, representing the excesses of the median over the 

a smallest observations, so that if T is the median value the n observations 
are represented by T — T — T, T -f- «ii .T -f- «.• 

The probability of occurrence of any series of observations, the true centre 
of the distribution being 0, 

may now be written 


»1 L • ... da. da\ ... da'. 

3(1, Oi ... a„o'i ... o'.) 


= n ! L . ... da, da\ ... da\ 


where, if, for example, 0 lies between T — a', and T — 


L = 


J_g- (2/»-l)(T-tf)-S*, (a+a') +2(a',+ ... +aV,) 
2 " 


Given the configuration of the sample, therefore, the probability that T 
lies in a range dT, between the limits 6 -f- 0 -f- a', is 

df=^ e2(a',+ ... +«W g- (2i»-l)(T-«) ^ 

of which the integral between these limits is 


(2p--l)A 


g2(a',+ ... +aVi) 


and A is equal to the sum of all such integrals 

1 - e—-f J -f i __ g2a',+2a'.-5«'.^ 

-f 1 — e““‘ -f- } -h J (e2«i-3«. _ ^a,+2a,-6a,^ 


Apart from the details of the analysis, however, it is apparent that if attention 
i$ confined to samples having a given configuration the sampling distribution of 



302 


R. A. Fisher. 


24.302 


T for a given 0 is found from the likelihood of 6 for a given T, the probability 
curve in the first case being the mirror image of the likelihood curve in the 
second. 


To evaluate the amount of information supplied by this distribution we must 
evaluate the mean square of 

— log ^ . 
d0 ^ dT 


Now, if T lies between 0 + 9 + 


so that in this case 


- (2/> - 1) (T - 6) 


and the amount of information supplied by our estimate, in conjunction toith 
a specification of the conjiguraiion of the sample from which it was obtained^ is 

^ {1 — + 3 (e—'• — + 6(... 

+ 1 — e-“* + 3 («-'■ — + 6 ( }. 

This value will differ greatly from sample to sample. Thus, if and a\ 
were both large, so that the median lies in a considerable range otherwise 
unoccupied by observations, the amount of information approaches unity ; 
at the other extreme if a^ and a\ were both so small that is near to unity, 
then 

A-^ 2/(28 -f 1), 

and the amount of information rises to (2s -f-1)^, or n^. 

To find the average value of the amount of information derivable from the 
median, in conjunction with the configuration of the sample, we may note 
that the probability for a given configuration that s p observations shall 
exceed, and s — p + 1 fall short of the true value is 

^ (f,2a\+2a',+ ... - (2j>-3) _ -2a\+2a',+ ... - (2|>-l)aV\ 

(2p~l)A^^ ^ 

and that the amount of information is obtained by multiplying this probability 
by (2p — 1)* and adding for all values of p. 

The average information for all configurations may, therefore, be found 
from the total probability for all configurations that exactly s -I- p observations 



24.303 

Two New Properties of Mathematical Likelihood. 303 

shall exceed the true value; since the probability of exceeding 6 is ^ inde* 
pendently for each observation, the probability is 

_wj_ 

2" (5 -f- p)! (« - p -f 1) ! 

and this, multiplied by (2p — 1)®, and added for all values of p, will give the 
average amount of information. The probabilities are the terms of the 
expansion of (i + i)**, 

and (2p — 1) is twice the deviation from the mean corresponding to each value 
of p. The variance of the binomial is well known to be exactly Jw, and the 
average amount of information used is consequently found to be exactly w, equal 
to the total amount known to be contained on the average in the sample. 

The process of taking account of the distribution of our estimate in samples 
of the particular configuration observed has therefore recovered the whole of 
the information available. This process will seldom bo so convenient as the 
use of an estimate by itself, without reference to the configuration, for instead 
of replacing the n observations by a single value, we now have to take account 
of all their values individually. Actually, indeed, in this case only the central 
group of values matters greatly, but in general the theoretical process illustrated 
here uses the available information eichaustively, only at the expense of 
abandoning the convenience of disregarding all properties of the sample beyond 
the best estimate it can provide. The reduction of the data is sacrificed to its 
complete interpretation. 

The frequency distribution, which makes this complete interpretation 
possible, is the mirror image of the likelihood function. Thus if T^ is the 
estimate (the median) derived from the actual sample observed, and L (0 — T^) 
is the likelihood derived from this sample of any value of 0, then the sampling 
distribution of T for any value of 0, in samples of the same configuration is 
given i>y dja. L (8 — T) dT. 

This is an extremely simple derivation of the sampling distribution of the 
estimate of maximum likelihood from the form of the likelihood fimction. 

4. The Simultaneous Estimation of Location and Scaling. 

In a very frequent class of cases not only the origin but the scale of the 
distribution is also represented by a parameter to be estimated from the 
observations. The firequency element is then of the form 



24.304 


304 K. A. Fisher, 

where 



III such cases it is obvious that the sample of values 5 i*i relation to any 
values aj and ag of the parameters corresponds in the sense of section 3 to the 
sample of values of x in relation to the values 0^ + and Ogag, and a double 
series of samples exists corresponding to any sample observed. 

The samples will have all the same configurations in the sense that sup¬ 
posing any two observations of the sample, such as the lowest and the lowest 
but one in value, have values a, a b then the other members of the sample 
will be 

btp = 1, ...» — 2, 

wliorc the n — 2 values of tp specify the configuration, and are the same for 
all samples of which the configuration is the same. 

The frequency element 

Ldxi, ... dx^t 


giving the frequency with which the n observations fall within assigned values, 
may then be replaced by 


L 


3 (»^i- ■ 
a (a, 6, t, 


-^2^—- da db dtf ... dtn^>t 
• ^n-2) 


where the Jacobian is simply 


1 0 0 ... 0 

1 1 0 ... 0 

] b ... 0 


I 1 0 ... 6 1 

or 6" ■*. The simultaneuiis frequency distribution of a and b is therefore given 
by 

df oc L6” * da db. 


Now, it is evident that the estimates of G^ and 0g from such a sample will be 


Ti = a 4- X6, 

T,= p6, 



24.305 


Two New Properties of McUhematiml Likelihood. 306 

where X and p. depend only on the configuration of the sample. Hence 

8(Ti. T.) _ 11 01 
d (a, b) I X {A I 

and the distribution of these estimates in samples of the same configuration 
will be 

dfoo (4) 

where in L, -f- is substituted for p = 1, the n values of u 

being known for the configuration observed. 

If, therefore, we choose to take into account not merely the sampling dis¬ 
tribution of our estimates for samples of all configurations, distributions which 
will involve, apart from the parameters of the population, only these two 
statistics, but rather the special simultaneous distribution for the particular 
configuration observed, we may obtain this special distribution directly from 
the form of the likelihood function. 

Since, moreover, the whole course of the likelihood function is taken into 
account, it is, from this point of view evident that no information can bo lost. 
An independent analytical proof of this is as follows ; it is equally applicable 
to information in respect of 6^ and of 6g. 

The information respecting 6i contained in a single observation from the 
distribution (4) is numerically equal to the average value of 



for all values of (Tj — Oj) from — oo to oo, or, otherwise, to the average value 

{I; 8 (.<«/))■ 

where / (x — is the frequency of an observation falling in the range dx. 
The average for all values of T^ is, for any particular observation, the average 
for all values of x. Now the average value of 

is zero, for 

which is zero, since the total frequency is unity, independent of But the 
average value for all values of (T^ — and for all configurations including 



24.306 


306 R. A. Fisher. 

variations of T 2 , is the average value for all possible samples. We may apply 
this principle to the expression 

S«(|l«6/) 

when all the values of t are independent. Then the average value of the square 
of the sums of w terms, independent and all having a mean value zero, is n 
times the mean square of each of them, or n times the mean value of 

for all values of x from — qo to qo , which is, by definition, the amount of 
information supplied by a sample of n observations. Hence the average 
amount of information respecting Oj supplied by (4) for all configurations is 
the entirety of that supplied by the data. 

With respect to O 2 , we require the average value of 

for all values of Tj from (.) to 00 . The average of this for all configurations and 
for all values of Tj, again reduces to the mean value of 



for all values of x from — qo to qo , and so to the average amount of informa¬ 
tion contained in a sample of n observations. 

i^unmiary. 

(I) Reasons are given for the use of mathematical likelihood in problems 
of inductive inference. 

(II) When a statistic exists, satisfying the criterion of sufficiency, the 
likelihood function involves only that statistic. 

(III) An example is given of a sufficient statistic, and its sampling distribu¬ 
tion is expressed in terms of the likelihood function. 

(IV) This property is generalized for all cases of simple estimation, where 
a sufficient statistic exists. 

(V) It is shown that these cases and only these supply tests of significance 
of the kind termed by Neyman and Pearson “ uniformly most powerful ** with 
regard to a class of alternative hypothesis. 



24.307 


Two New Properties of MaihernoUical Likelihood, 307 

(VI) Where no sufficient statistic exists the precision of estimation may in 
general be enhanced by the use of ancillary statistics. A class of cases is 
defined and illustrated in which the totality of the ancillary information supplied 
by the observations may be utilized. 

(VII) This process gives a very simple derivation of sampling distributions, 
in which there is no loss of information, even for small samples. 



26.390a 


^5 

THE FIDUCIAL ARGUMENT IN STATISTICAL 
INFERENCE 


AUTHOR’S NOTE 

A very simple exposition of the type of argument now made possible 
through the development of exact tests of significance, in terms suita¬ 
ble for students new to the subject. The contrast with the argument 
of inv^erse probability is to be emphasised, since, if this is ignored, 
the introduction of inconsistent axioms has its natural consequence 
in leading to paradoxical conclusions. The example worked in Sec¬ 
tion III has since been adequately tabulated (“The asymptotic ap¬ 
proach to Behrens’ integral with further tables for the d test of sig¬ 
nificance," Annals of Eugenics, Vol. XI, Pt. II, pp. 141-172, 1941). 


^ Reprinted from Annals of Eugenics, Vol. \I, Pt. IV, pp. 391—398, 1935. 



25.391 


THE FIDUCIAL ARGUMENT IN STATISTICAL 
INFERENCE 

By R. A. FISHER, So.D., F.R.S. 

I. The natube op fiducial probability 

In a series of papers from 1930, the author has called attention to a form of argument, which 
seems to have been entirely overlooked by the classical writers on probability, but which 
arises naturally from the exact tests of significance, when the variate is tabulated in terms 
of the probability. This form of argument leads in certain cases to rigorous probability 
statements about the unknown parameters of the population from which the observational 
data are a random sample, without the assumption of any knowledge respecting their 
probability distributions a priori. For such deductions we need to know the exact sampling 
distributions of statistical estimates, calculable from the observations only, of the unknown 
parameters, and these distributions must be continuous. It was probably these restrictions 
which stood in the way of the recognition, by the early writers on probability, of a form of 
argument having both theoretical interest and practical value; for the problems of distribu¬ 
tion of which they possessed the exact solutions were nearly all discontinuous, being, like 
the binomial expansion, and the many similar generating functions given by Laplace, 
distributions of frequencies, rather than of continuously variable measurements, or func¬ 
tions calculated from these. The exact treatment of the mean of a normal sample was first 
given by “Student” in 1908, and since that time numerous exact solutions have become 
available, covering somewhat completely the problems connected with normally distributed 
variates, in addition to others of a more miscellaneous character. 

The form of argument is extremely simple and may be illustrated by applying it to 
“Student’s” solution. If a sample of n observations, ...» has been drawn from a 
normal population having a moan value p., and if from the sample we calculate the two 

statistics _ 1 „, , 

x=-^ x) 
n 

and s^= 8 {x — x)\ 

n — I 

where 8 stands for summation over the sample, “Student” has shown (1925) that the 
quantity <, defined by the equation {x—p) 

s * 

is distributed in different samples in a distribution dependent only from the size of the 
sample, n. It is possible, therefore, to calculate, for each value of w, what value of t will be 
exceeded with any assigned frequency, P, such as 1 per cent, or 5 per cent. These values of 
t are, in fact, available in existing tables (Fisher, 1925-34). 



25.392 


392 FIDUCIAL ARGUMENT IN STATISTICAL INFERENCE 

It must now be noticed that Ms a continuous function of the imknown parameter, the 
mean, together with observable values, 5, s and n, only. Consequently the inequality 

t>ti 

is equivalent to the inequality fi<x — stjy/Uy 

so that this last inequality mast be satisfied with the same probability as the first. This 
probability is known for all values of , and decreases continuously as is increased. Since, 
therefore, the right-hand side of the inequality takes, by var 3 dng all real values, we may 
state the probability that /a is less than any assigned value, o r the probabilit y that it. liefl 
^^weeriany assigned values, or, in short, its proTmbflity distribution, in the li ght of the 
samjE^le pbseixod.. 

It is of some importance to distinguish such probability statements about the value of /x, 
from those that would be derived by the method of inverse probability, from any postulated 
knowledge of the distribution of /x in the different populations which might have been 
sampled. It is only when the idea is totally set aside that we are seeking an inverse 
probability, that the meaning of fiducial probability is clearly apprehended. The inverse 
probability distribution would specify the frequency with which /x would lie in any assigned 
range f//x, by an absolute statement, true of the aggregate of cases in which the observed 
sample yielded the particular statistics x and s. This can be found by Bayes’ procedure, if 
the prior distribution of ji is known. The distribution which wo have obtained is inde¬ 
pendent of all prior knowledge of the distribution of fi, and is true of the aggregate of all 
samples without selection. It involves x and 8 as parameters, but does not apply to any 
special selection of these quantities. To distinguish it from any of the inverse probability 
distributions derivable from the same data it has been termed the fiducial probability 
distribution, and the probability statements which it embraces are termed statements of 
fiducial probability. To attempt to define a prior distribution of p, which shall make the 
inverse statements coincide numerically with the fiducial statements is really to slur over 
this distinction between the meaning of statements of these two kinds. 

It is necessary to emphasise also that statements similar to those of fiducial probability 
can only represent the true state of knowledge derivable from the sample, if the statistics 
used contain the whole of the relevant information which the sample provides. If, for 
example, an estimate .s', derived from the mean error, had been used in place of one derived 
from the mean square error, and a quantity t' had been defined by the equation 

^ -? ’ 

the distribution of f', like that of t, would depend only on the size of the sample; and 
probability statements accurate for I' could be expressed in terms of fi. The probability 
distribution for /x obtained in this way would, of course, differ from that obtained from t, 
and the probability statements derived from the two distributions would bo discrepant. 
There is, however, in the light of the theory of estimation, no difficulty in choosing between 




26.393 


R. A. FISHER 393 

such inconsistent results, for it has been proved that, whereas s' uses only a portion of the 
information utilised by .9, on the contrary, s utilises the whole of the information used by 
s\ or indeed by any alternative estimate. To use therefore, in place of a would be 
logically equivalent to rejecting arbitrarily a portion of the observational data, and basing 
probability statements upon the remainder as though it had been the whole. Dr J. Neyman 
has unfortunately attempted to develop the argument of fiducial probability in a way 
which ignores the results from the theory of estimation, in the light of which it was originally 
put forward. His proofs, therefore, purport to establish the validity of a host of probability 
statements many of which are mutually inconsistent. 

j When, as with inferences respecting a single parameter based on the use of sufficient 
! statistics only, we obtain a unique probability distribution for that parameter, all possi- 
j bility of admitting mutually conflicting inferences is excluded, and the resulting distribu- 
I tion may be properly termed the fiducial distribution of the parameter. The same would bo 
true of inferences concerning the simultaneous values of two or more parameters, if such a 
unique simultaneous distribution could be obtained. Such a simultaneous distribution does 
not in general follow by any simple generalisation of the argument from the probability 
integral by which the distribution of a single parameter may be obtained, such as that 
recently developed by Dr Neyman. It is the purpose of the present note to demonstrate the 
possibility of thus completing the solution, in the simple class of problems which arise from 
the normal distribution. 


II. Posterior fiducial inferences 

As a preliminary wo may consider the problem: Given the value of n observations 
a?!, ..., drawn from a normal population, to find the fiducial distribution of an additional 
observation, x, not yet made. 

Two points may be noted. First, that, in the theory of inverse probability, problems of 
this kind are only to be solved after the simultaneous distribution of the population 
parameters has been obtained, and by means of this simultaneous distribution; whereas 
by the fiducial argument they are solved directly. Second, that the concept of a fiducial 
distribution is now being applied not to a parameter of the population, but to an observa¬ 
tion, the distribution of which is given in terms of such parameters. With those comments 
we may now suppose that the observed sample has yielded statistics 

n 

and s’* — —(» — «)*. 

.In x-x 

Lot, now, t= I —--r-. 

Vn + 1 8 

That tt so defined, is distributed, independently of the parameters of the population, in 
“Student’s’* distribution forn — 1 degrees of freedom, follows (Fisher, 1926) from the fact 



394 


26.394 


FIDUCIAL ARGUMENT IN STATISTICAL INFERENCE 


that it is the ratio of a quantity x^x, normally distributed about zero, to an estimate of its 
standard error, statistically independent of x — x, based on the moan square deviation 
among the different members of the original sample. 

It follows that the known probability that t should exceed any assigned value is the 
probability that x should exceed the value 





“Student’s” distribution, with the factor 
tion of 



, tliercfore provides the fiducial distribu- 


x — x 


8 ’ 


in which the only unknown element is the future observation, x. 

We may next consider the analogous, more general problem: Given a sample of n 
observations yielding the statistics x and s, to find the fiducial frequency distribution of the 
statistics x' and s' derived from a subsequent sample of ri' observations. 

In proceeding to the solution, wo again avoid all reference to unknown and hypothetical 
parameters, and develop tluj solution in terms of directly observable quantities only, by 
obtaining two quantities t and z, the simultaneous distribution of which is known with 
exactitude, and which are expressible in terms of the observable features of the two samples. 

For this purpose wo liave _ -- 

^ _ (a? — a:) Vmi' (a + 7h' — 2) 

Vn-hn' V\n — 1 ) 5 ^ + [n' — 1 ) 

and 2 = log «—log s'. 

Then it is well known that, in the aggregate of pairs of samples drawn from such a popula¬ 
tion, t is distributed in “Student’s” distribution for n + n' — 2 degrees of freedom, while z 
is distributed in its own characteristic distribution, determined by the two numbers of 
degrees of freedom, , , , 

® Wj-a—1, ?i2=a—1. 

The values of x' and s' arc, of course, determined by the values of t and z to which they 
lead, so that, substituting for I and z in their known distributions, wo have for the simul¬ 
taneous fiducial distribution of x' and s' the following expression 

J(^ 4 .^'_ 3 )! 2 (a- !)*<»- 1 ) (n' - 1)* s^-^ s'^-^ dx'ds' 

/i 1 ■ 

V n 7i' 

The general distribution has been established merely from the known simultaneous dis¬ 
tribution of t and 2 , and in itsedf gives the essential information respecting a subsequent 
sample for the sake of which knowledge of the population parameters would, in the 
traditional procedure, be sought. We may, however, now, as a particular case, allow n' to 


J(a-:i)!i(«/-3)!V^ 


V a 7i' 


(a — 1)5**+ (7t'— 1) 





26.305 


R. A. FISHER 396 

tend to infinity, and in consequence, the statistics x' and 8' to tend to the parametric values, 
and a. The simultaneous fiducial distribution of /x and a found in this way is 


1) 2a* ^ 


The two parameters /x and a are not independently distributed, that of fi being distributed 
for any given a with variance a^jn , but the marginal distribution of found by integrating, 
with respect to ct, from 0 to infinity is, as might be expected. 




J(n-3)1 


(»-1) |l 


n iii-xY 


in accordance with the fact that 


{li-x)Vn 

8 


is distributed as is f for (n - 1) degrees 

in accordance with the fact that -^— 

a* 

is distributed as is x* for (w-1) degrees of freedom. 

It thus appears that, for the special case of the mean and variance of the normal dis¬ 
tribution, there is no difficulty in extending the notion of fiducial probability unambigu¬ 
ously to the simultaneous distribution of two parameters. 

In general, it appears that if statistics Tj, Tj, Tg, ... contain jointly the whole of the 
information available respecting parameters B^, ..., and if functions fg, ^ 3 , ... of 

the T’s and ^’s can be found, the simultaneous distribution of which is independent of 
^ 2 > •••> fiducial distribution oiB^.B^yB^y ... simultaneously may be found by 

substitution. For a single statistic having a distribution dependent only on a single para¬ 
meter, of which it is a sufficient estimate, such a function is always provided by the 
probability integral. Hence the generality of the method in univariate cases. 


of freedom. Similarly, the marginal distribution of o is 

- --^2da 
I 2a* j ^ ^ * 


III. Application of the method to special problems 

The process of solution of the fiducial distributions of parameters, by the recognition of 
quantities of known distribution functionally related to them, is a powerful tool for the 
solution of a variety of problems which offer difficulties to other methods of approach. This 
may be illustrated in two such problems, which are occasionally of practical interest. 



396 


26.396 


FIDUCIAL ARGUMENT IN STATISTICAL INFERENCE 

(i) The difference between the means of two normally distributed populations 
Let us suppose that a sample of n observations has yielded a mean, ac, and an estimated 
variance of the mean, a*, so that 

n (»— 1) ’ 

then we know that if /x is the mean of the population, 

IX = X-{-8tj 

where t is distributed in “Student’s” distribution. Similarly, for the mean ot“ a second 
population, of which we have n* observations, we may write 

P =X' + s't' y 

where t is distributed in “Student’s” distribution for n' —\ degrees of freedom, indepen¬ 
dently of t. If now 

fi' —/i = 8, x'~x = d, 

we find that € = b-d^s't'-st; 

and since s' and s are known, the quantity represented on the right has a known distribution, 
though not one which has been fully tabulated. The equation may be written 

€—Vs^ + s'^ {t' COS R-t Bin J?), 

where tan R^sjs'y so that is a known angle. 



1ft and t' be taken as the co-ordinates of a point on a plane, the frequency of the observa¬ 
tions falling within any area of the plane is calculable. The points for which c has any given 
value lie on a straight line, at a distance from the origin ± and making an angle 

R with the axis oft. The fiducial probability that € exceeds any given value is the frequency 
in the area above this line. If n and n' are both increased, the distribution of e tends to be 




25.397 


R. A. FISHER 397 

normal and independent of R ; when R is 0° or 90® the distribution is of “ Student’s ” form. 
In general it involves n, n' and R, and for any chosen probability, therefore, requires a 
table of triple entry. 

Any fiducial distribution supplies a series of possible tests of significance. In this case, 
since d is known, wo may use + to test the h 3 rpothesis that 8 has the chosen value 
zero. This is, in fact, the exact test for the significance of the difference, d, between the 
observed means, equivalent to that given in 1929 by W.-V. Behrens. 


(ii) The variance of a normally diatributed set of meana 

Sup{) 08 e we have k samples of n observations each from equally variable populations, or 
other such material suitable for the analysis of variance. Let the analysis obtained, apart 
from excluded items, bo as follows: 



Degrees of 

Sum of 

Mean 


freedom 

squares 

square 

Among samples 


A 

a 

Error 

«i 

B 

b 


If a is significantly greater than 6, as shown by the z test, where 

2z=loga-~log 8, 

there is reason to suppose that the means of the k population sampled were not all equal. 
In some such cases, though not in all, it is appropriate to suppose that these means them¬ 
selves constitute a sample from a normal population with unknown variance. Then the 
test of significance will have indicated that this variance is significantly greater than zero. 
Now if 6 is the variance within samples, and </> the variance of the population of means. 


or 6-^9, 

n, 

where xl is distributed as is the sum of the squares of n, independent normal deviates all 
having unit variance. Similarly , 

“l 


Hence 


Adi x! 


But x! and xl are distributed independently in distributions of known form, hence the 
distribution of ^ may be calculated from their simultaneous distribution. 



26.398 


398 FIDUCIAL ARGUMENT IN STATISTICAL INFERENCE 

^i/Xi taken as the coordinates of a point, then the set of points 

consistent with any given value of <l> lie on a straight line, making with the axis of nJxi 
an angle, the tangent of which is a/b, or e^. The fiducial probability of exceeding any 
chosen value is the total frequency to the right of the corresponding line. The observed 



value of z is significant if the parallel lino through the origin has to the right of it some 
95 or 99 per cent, of the total, according to the level of significance chosen. Lines which 
do not strike the axis of abscissae to the right of the origin, correspond to negative values of 
<j>, and are without interest for the problem stated. For large values of and n^, the fre¬ 
quency distribution tends to a normal form, without correlation, centred on the point 
( 1 , 1 ). 

Tabulation of the exact solution encounters in both problems the difficulty of triple 
entry, which may be largely mitigated by the use of a harmonic series of values of and 
appropriate for asymptotic interpolation, as in the table of z. It is doubtful, too, whether 
the exact tests will often differ materially from the simple approximate tests commonly 
used. Nevertheless, a tabic of cither kind, once constructed, would be probably of suffi¬ 
ciently frequent use in enabling exp(Timenters to form a rapid opinion as to what interpreta¬ 
tions of their data should bo regarded as acceptable. They are heue principally of interest 
as illustrating the simplicity with which the fiducial argument may be applied to this type 
of problem. 

REFERENCES 

Bayes, T. (1763). “ An essay towards solving a problem in the doctrine of c.hances.” Philos. Trans. 63,370-418. 
Beitrens, W.-V. (1929). “Ein Betrag zur Fchleubcrcchnung bci wenigen Beoba<ihtungen.” Landiv. Jh. 
68, 807-37. 

Fisher, R. A. (1925-34). Statistical Methods for Research Workers. Edinburgh: Oliver and Boyd. 

--(1925). Applications of “Student’s” Distribution. Metron. 3, 90 104. 

-(1930). “Inverse probability.” Proc. Camb. phil. Soc. 26, 528 35. 

-(1933). “The concepts of inverse probability and fiducial probability referring to unknown parameters.” 

Proc. roy. Soc. A, 139, 343-8. 

- (1934). “Two new properties of mathematical likclihcwd.” Proc. roy. Soc. A, 144, 285 307. 

Neyman, J. (1934). “On the two different aspects of the representative method: the method of stratified sam¬ 
pling and the method of purposive selection.” J. H. Statist. Soc.. 97, 558-625. 

“Student” (1925). New tables for testing the significance of observations. Mdron. 3, 105-8. 



26.38a 


26 


THE LOGIC OF INDUCTIVE INFERENCE 


AUTHOR’S NOTE 

In giving this general explanatory account of advances in statistical 
methods at that time comparatively recent, the opportunity was 
taken to add a few novelties which might make the evening more in¬ 
teresting to those few among the audience who were already familiar 
with the general ideas. Of these the modern reader may still find 
interest in the second half of Example 1. The discussion here may 
serve to distinguish statements of fiducial probability proper from 
the *'confidence belts’* based on tests of significance applied to dis¬ 
continuous data, which in reality represent inequality statements as 
to fiducial probability. 


* Reprinted from Journal of the Royal Statiatieal Society ^ Vol. XCVIII, Ft. I, 
pp. 89-54, 1985. 



26.39 

39 


The liOQic of Inoxjctive Inference. 

By Professor R. A. Fisher, Sc.D., F.R.S. 


When the invitation of your Council was extended to me to address 
this Society on some of the theoretical researches with which I have 
been associated, I took it as an indication that the time was now 
thought ripe for a discussion, in summary, of the net effect of these 
researches upon our conception of what statistical methods are 
capable of doing, and upon the outlook and ideas which may usefully 
be acquired in the course of mathematical training for a statistical 
career. I welcomed also the invitation, personally, as affording an 
opportunity of putting forward the opinion to which I find myself 
more and more strongly drawn, that the essential effect of the 
general body of researches in mathematical statistics during the last 
fifteen years is fundamentally a reconstruction of logical rather than 
mathematical ideas, although the solution of mathematical problems 
has contributed essentially to this reconstruction. 

I have called my paper “ The Logic of Inductive Inference.” It 
might just as well have been called “ On making se nse of figures.” 
For everyone who does habitually attem^ the difficurt tasTc o? majj^mg 
sense of figures is, in fact, essaying a logical process of the kind we 
call inductive, in that he is attempting to draw inferences from the 
particular to the general; or, as we more .usually say in statistics, 
from the sample to the population. Such inferences we recognize 
to be uncertain inferences, but it does not follow from this that they 
are not mathematically rigorous inference's. In the theory of 
probability we are habituated to statements which may be entirely 
rigorous, involving the concept of probability, which, if translated 
into verifiable observations, have the character of uncertain state¬ 
ments. They arc rigorous because they contain within themselves 
an adequate specification of the nature and extent of the uncertainty 
involved. This distinction between uncertainty and lack of rigour, 
which should be familiar to all students of the theory of probability, 
seems not to be widely understood by those mathematicians who 
have been trained, as most mathematicians are, almost exclusively 
in the technique of deductive reasoning; indeed, it would not be 
surprising or exceptional to find mathematicians of this class ready 
to deny at first sight that rigorous inferences from the particular to 
the general were even possible. That they are, in fact, possible is, I 



MAO 

40 Fishsr —The Logic of Inductive Inference, 

suppose, recognized by all who arc familiar with the modern work. 
It will be sufficient here to note that the denial implies, qualitatively, 
that the process of learning by observation, or experiment, must 
always lack real cogency. 

My second preliminary point is this. Although some uncertain 
inferences can be rigorously expressed in terms of mathematical 
probability, it does not follow that mathematical probability is an 
adequate concept for the rigorous expression of uncertain inferences 
of every kind. This was at first assumed; but once the distinction 
between the proposition and its converse is clearly stated, it is seen 
to be an assumption, and a hazardous one. The inferences of the 
classical theory of probability are all deductive in character. They 
are statements about the behaviour of individuals, or samples, or 
sequences of samples, drawn from populations whicli are fully known. 
Even when the theory attempted inferences respecting populations, 
as in the theory of inverse probability, its method of doing so was to 
introduce an assumption, or postulate, concerning the population 
of populations from which the unknown population was supposed to 
have been drawn at random ; and so to bring the problem within the 
domain of the theory of probability, by making it a deduction from 
the general to the particular. The fact that the concept of probability 
is adequate for the specification of the nature and extent of uncer¬ 
tainty in these deductive arguments is no guarantee of its adequacy 
for reasoning of a genuinely inductive kind. If it appears in induc¬ 
tive reasoning, as it has appeared in some cases, we shall welcome it 
, as a familiar friend. More generally, however, a mathematical 
quantity of a different kind, which I have termed mathematical 
\ likelihood^ appears to take its place as a measure of rational belief 
' when we are reasoning from the sample to the population. 

Mathematical likelihood makes its appearance in the particular 
kind of logical situation which I have termed a problem of estimation. 
In logical situations of other kinds, which have not yet been explored, 
possibly yet other means of making rigorous our uncertain inferences 
may be required. In a problem of estimation we start with a 
knowledge of the mathematical form of the population sampled, but 
without knowledge of the values of one or more parameters which 
enter into this form, which values would be required for the complete 
specification of the population; or, in other words, for the complete 
specification of the probabilities of the observable occurrences which 
constitute our data. The probability of occurrence of our entire 
sample is therefore expressible as a function of these unknown para¬ 
meters, and the likelihood is defined merely as a function of these 
parameters proportional to this probability. The likelihood is thus 
an observable property of any hypothesis which specifies the values 



26.41 

Fisher— The Ijogic of Inductive Inference, 41 

of the parameters of the population sampled. Ncyman and Pearson 
have attempted to extend the definition of likelihood to apply, not to 
jiarticular hypotheses only, but to classes of such hypotheses. With 
this extension we are not here concerned. The best use I can make 
of the short time at my disposal is to show how it is that a considera¬ 
tion of the problem of estimation, without postulating any special 
significance for the likelihood funirtion, and of course without intro¬ 
ducing any such postulate as that needed for inverse probability, 
docs really demonstrate the adequacy of the concept of likelihood for 
inductive reasoning, in the particular logical situation for which it 
has been introduced. 

In the theory of estimation we proceed by building up a series of 
criteria for judging the merits of the estimates arrived at by different 
methods. Each criterion is thu.s a method of forming a judgment 
that some one estimate or group of estimates is better than others. 
An initial difficulty here arises, best expressed in the question, “ Better 
for what ? *’ and it is remarkable that this preliminary difficulty does 
not frustrate our enquiry. W'hatever other purpose our estimate 
may be wanted for, we may require at least that it shall be fit to use, 
in conjunction with the results drawn from other samples of a like 
kind, as a basis for making an imi)rovcd estimate. On this basis, in 
fact, our enquiry becomes self-contained, and capable of developing 
its own api)ropriate criteria, without reference to extraneous or 
ulterior considerations. 

This logical charact eristic of our approach naturally requires that 
our edifice shall be built in two stories. In the first we are concerned 
with the theory of large sa??iples, using this term, as is usual, to mean 
that nothing that we say shall be true, except in the limit when the 
size of the sample is indefinitely incre.ased; a limit, obviously, never 
attained in practice. This part of the theory, to set off against the 
complete unreality of its subject-matter, exploits the advantage that 
in this unreal world all the possible merits of an estimate may be 
judged exclusively from its variability, or sampling variance. In 
the second story, where the real problem of finite samples is con¬ 
sidered, the requirement that our estimatcjs from these samples may 
be wanted as materials for a subsequent process of estimation is found 
to supply the unequivocal criteria required. Let me sketch the two 
stages, with special emphasis on the staircase, relegating all mathe¬ 
matical demonstrations to the written word. 

First, we may distinguish consistent from inconsistent estimates. 
An incon.sist<uit estimate is an estimate of something other than that 
which we want an estimate of. If we choose any process of estima¬ 
tion, and imagine the sample from which we make our calculations 
to increase without limit, our estimate will usually tend, in the 



26.42 

42 Fishsr —The Logic of Inductive Inference, 

special sense in which that word is used in statistics, to a limiting 
value, which is some function of the unknown, parameters. Our 
method is then a consistent one for estimating this particular para¬ 
metric function, but would be inconsistent for estimating any different 
function. The limiting value is easily recognized by inserting for 
the frequencies in our sample their mathematical expectations. 

, Having satisfied ourselves that our method is consistent, we may 
pow confine our attention to the class of estimates which, as the 
Sample is increased without limit, tend to be distributed about their 
nmiting value in the normal distribution; that is, to the class to 
which the theory of large samples is applicable. The normal dis¬ 
tribution has only two characteristics, its mean and its variance. 
The mean determines the bias of our estimate, and the variance 
\dietermine8 its precision. 

The consideration of bias need not detain us. With consistent 
estimates it must tend to zero; if we wish to use our estimates for 
tests of significance it is as well that it should tend to zero more 
rapidly than n-*. We can always adjust our estimate to make the bias 
absolutely zero, but this is not usually necessary, for in estimating 
any parameter we must remember that we are at the same time 
estimating its reciprocal, or its square, or any other such function, and 
zero bias in one of these usually implies bias of the order of in the 
others. This is therefore the normal rate for the bias to approach zero. 

Variance is a more serious affair; for a knowledge of the variance 
of our estimate does not provide us with any means for producing one 
which shall be less variable. In the cases which we are considering 
the variance falls off with increasing size of sample always ultimately 
in inverse proportion to n. The criterion of efficiency is that the 
limiting value of nF, where V stands for the variance of our e.stimate, 
shall be as small as possible. The first point which needs mathe-ii 

matical proof is that the limiting value of necessarily less than I 

or equal to a certain quantity, i, which is independent of the method' 
of estimation used. 

To show that if T be an estimate of an unknown parameter 6, 
normally distributed with variance F, then the limit as n -, of 

■^y cannot exceed a value, t, defined independently of methods of 

estimation. 

Let f stand for the frequency of a particular kind of observation, 
<!> for that of a particular kind of sample, and €> for that of all the 
kinds of sample which yield a particular value T of the statistic 
chosen as an estimate. Then in general 

log = ^(log/). 




Fisher —The Logic of Inductive Inference. 


26.43 

43 


where S stands for summation over the sample; next 

where £ stands for summation over the possible samples which yield 
the same estimate; and finally 
1 = 

where S' stands for summation over all possible values of the 
statistic.. When continuous variation is in question, symbols of 
integration will replace the symbols of summation S and S'. 

If T is distributed normally about 6 with variance V, 


S* _ 1 

Since this is independent of T, we may take the average for all 
values of T, and obtain 

——S' +S' - f—y. 

ae* ^ o V ae/ 

Hence 

-i-s'v??y 

V~^ O\0O/ ’ 

since S'(<I>) is independent of 6. 

Now consider 

<l>d6 

as a variate, among the samples which lead to the estimate T. 
Each value of x occurs with frequency so the variance of x is 


' <p I Vaey o\^J f * 


but the variance of x is positive, or, the limiting case zero ; in 
taking the mean for all values of T it follows .that 

is positive or zero. In other words, 



26.44 

44 


Fisher —The Logic of Inductive Inference. 


where it is to be noted that the quantity on the right is the average 
value for all possible samples of 

* 

and is therefore independent of the method of estimation. To 
evaluate it we may note that 

which is the average value in all possible samples of 

or the average value for all possible individual observations of 



It appears then that, in large samples in which the statistic is 
normally distributed, 



where i is the average value of 



or, if stand for summation over all* possible observations, 



We shall come later to regard i as the amount of information 
supplied by each of our observations, and the inequality 



as a statement that the reciprocal of the variance, or the invariance, 
of the estimate, cannot exceed the amount of information in the 
sample. To reach this conclusion, however, it is necessary to prove 
a second mathematical point, namely, that for certain estimates, 
notably that arrived at by choosing those values of the parameters 
which maximize the likelihood function, the limiting value of 


1 



Fisher —The Logic of Inductive Inference. 


26.45 

45 


Of the methods of estimation based on linear functions of the 
frequencies, that with smallest limiting variance is the method of 
maximum likelihood, and for this the limit in large samples of 

is equal to i. 
nV ^ 

Let X stand for the frecjuen<*y observed of observations having 
probability of occurrence / and let m = nf the expected frequency 
in a sample of n. Consider any linear function of the frequencies, 
.Y ..3 S(kx), 

the summation being for all possible classes of observations, occupied 
or unoccupied. 

If the coefficients k are functions of 0, the equation, 

A' =- 0, 

may be used as an equation of estimation. This equation will be 
consistent if 

,S(</) = 0 

for all values of 0. Diflerontiatinn willi respeet to 0 it u)>pcar8 tliat 


Since the mean value of X is zero, the sampling variance of A" is 
M{k^nt) j 

but as the sample is increased indefinitely, the error of estimation 
bears to the sampling error of X the ratio 


If, therefon*, 


tends to a finite limit, 


_ 1 __ 1 



the sampling variance of our e.stiniate is 



or, using the condition for consi.steiH'V, 

SjkJ) 



26.46 

46 


Fisher —The Logic of Inductive Inference, 


We may now apply the calculus of variations or simple differen¬ 
tiation to find the functions of which will minimize the sampling 
variance. Since the variance must be stationary for variations of 
each several value of Ar, the differential coefficients of the numerator 
and the denominator, with respect to A;, must be proportional for all 
classes. Hence, 




which is satisfied by putting 


k = 


\df 

Jao- 


This also satisfies the requirement that 
S (^)=0 


for all values of 0. The equation of estimation 



is the equation of maximum likelihood. The limiting value of the 
sampling variance given by the analysis above is 



The condition for the validity of the approach to the limit is seen 
to be merely that i shall be finite. Cases where i is zero or infinite 
can sometimes be treated by a functional transformation of the para¬ 
meter ; other cases deserve investigation. The proof shows, in fact 
that where i is finite there really are I and no less units of information 
to be extracted from the data, if we equate the information extracted 
to the invariance of our estimate. 

This quantity i, which is independent of our methods of estima¬ 
tion, evidently deserves careful consideration as an intrinsic property 
of the population sampled.. In the particular case of error curves, 
or distributions of estimates of the same parameter, the amount of 
information of a single obseWation evidently provides a measure of 
the intrinsic accuracy with which it is possible to evaluate that 
parameter, and so provides a basis for comparing the accuracy of 
error curves which are not normal, but may be of quite different 
forms. 

We are now in a position to consider the real problem of finite 
samples. For any method of estimation has its own characteristic dis- 



26.47 

Fishbr —The Ltogic of Inductive Inference. 47 

tribution of errors, not now necessarily normal, and therefore its own 
intrinsic accuracy. Consequently, the amount of informatioii which it 
extracts from the data is calculable, and it is possible to compare the 
merits of different estimates, even though they all satisfy the criterion 
of efl&ciency in the limit for large samples. It is obvious, too, that in 
introducing the concept of quantity of information we do not want 
merely to be giving an arbitrary name to a calculable quantity, but 
must be prepared to justify the term employed, in relation to what 
common sense requires, if the term is to be appropriate, and service¬ 
able as a tool for thinking. The mathematical consequences of 
identifying, as I propose, the intrinsic accuracy of the error curve, 
with the amount of information extracted, may therefore be sum¬ 
marized specifically in order that we may judge by our pre-mathe- 
matical common sense whether they are the properties it ought to 
have. 

First, then, when the probabilities of the different kinds of observa¬ 
tion which can be made are all independent of a particular parameter, 
the observations will supply no information about the parameter. 
Once we have fixed zero we can in the second place fix totality. In 
certain cases estimates are shown to exist such that, when they are 
given, the distributions of all other estimates are independent of the 
parameter required. Such estimates, which are called sufficient, con¬ 
tain, even from finite samples, the whole of the information supplied 
by the data. Thirdly, the information extracted by an estimate can 
never exceed the total quantity present in the data. And, fourthly, 
statistically independent observations supply amounts of information 
which are additive. One could, therefore, develop a mathematical 
theory of quantity of information from these properties as postulates, 
and this would be the normal mathematical procedure. It is, 
perhaps, only a personal preference that I am more inclined to 
examine the quantity as it emerges from mathematical investigations, 
and to judge of its utility by the free use of common sense, rather than 
to impose it by a formal definition. As a mathematical quantity 
information is strikingly similar to entropy in the mathematical theory 
of thermo-dynamics. You will notice especially that reversible 
processes, changes of notation, mathematical transformations if 
single-valued, translation of the data into foreign languages, or 
rewriting them in code, cannot be accompanied by loss of information; 
but that the irreversible processes involved in statistical estimation, 
where we cannot reconstruct the original data from the estimate we 
calculate from it, may be accompanied by a loss, but never by a gain. 

Having obtained a criterion for judging the merits of an estimate 
in the real case of finite samples, the important fact emerges that, 
though sometimes the best estimate we can make exhausts the 



26.48 


48 Fisher —The Logic of Inductive Inference. 

information in the sample, and is equivalent for all future purposes 
to the original data, yet sometimes it fails to do so, but leaves a 
measurable amount of the information unutilized. How can we 
supxjlement our estimate so as to utilize this too ? It is shown that 
some, or sometimes all of the lost information may be recovered by 
calculating what I call ancillary statistics, which themselves tell us 
nothing about the value of the parameter, but, instead, tell us how 
good an estimate we have made of it. Their function is, in fact, 
analogous to the part which the size of our sample is always expected 
to play, in telling us what reliance to place on the result. Ancillary 
statistics are only useful when different samples of the same size can 
supply different amounts of information, and serve to distinguish 
those which supply more from those which supply less. 

Exam-ple 1. 

The use of ancillary statistics may be illustrated in the well-worn 
topic of the 2X2 table. Let us consider such a classification as 
Lange supplies in his study on criminal twins. Out of 13 cases 
judged to be monozygotic, the twin brother of a known criminal is in 
10 cases also a criminal; and in the remaining 3 cases he has not been 
convicted. Among the dizygotic twins there are only 2 convicts 
out of 17. Supposing the data to be accurate, homogeneous, and 
unselected, we need to know with what frequency so large a dis¬ 
proportion would have arisen if the causes leading to conviction had 
been the same in the two cla.sses of twins. We Ixavc to judge this 
from the 2X2 table of frequencies. 


Convictions of Like-sex Ttvins of Criminals. 



Convirl<Ml. j 

Not (’onviett'd. j 

Total. 

Monozygotic ... ...j 

10 1 

3 

13 

Dizygotic 

2 1 

15 

1 

Total . 

12 

18 

30 


To the many methods of treatment hitherto suggested for the 
2X2 table the concept of ancillary information suggests this new 
one. Let us blot out the contents of the table, leaving only the 
marginal frequencies. If it be admitted that these marginal fre¬ 
quencies by themselves supply no information on the point at issue, 
namely, as to the proportionality of the frequencies in the body of the 
table, we may recognize the information they supply as wholly 
ancillary; and therefore recognize that we are concerned only with 
the relative probabilities of occurrence of the different ways in which 




Fisher —The Logic of Inductive Inference. 


26.49 

49 


the table can be filled in, subject to these marginal frequencies. 
These ways form a linear sequence completely specified by giving to 
the number of dizygotic convicts the 13 possible values from o to 12 . 
The important point about this approach is that the relative fre¬ 
quencies of these 13 possibilities are the same whatever may be the 
probabilities of the twin brother of a convict falling into the four 
compartments prepared for him, provided that these probabilities 
are in proportion. 

For, suppose that, knowing him to be of monozygotic origin, the 
probability that he shall have been convicted is p, it follows that 
the probability that of 13 monozygotic (12 — .r) shall have been 
convicted, while (i + have escaped conviction, is 


13 ! 

(12 - \r) ! (r f X ) ! 


V 


12-3: 


(l-p)!'-. 


But, if we know that the probabilities are in proportion, the 
probability of a criminars brother known to b(^ dizygotic being 
convicted will also be p, an<l the probability that of 17 of these x 
shall have been convicted and (17 — 2 ;) shall have escaped conviction 
will be 

The probability of the simultaneous occurrence of these tw^o events, 
being the product of their respective probabilities, will therefore be 

_13 ! IIJ_ 

(12-r)!(H-a:)!a:!(17 -x)!l ’ 

in which it will be noticed that the powers of p and 1 — p are 
independent of a;, and therefore represent a factor which is the ^ame 
for all 13 of the possibilities considered. In fact the probability of 
any value of x occurring is proportional to 

_ 1 

(12 x) ! (T 4- 1 1 (17 — V) !’ 

and on summing the scries obtained by varying x, the absolute 
probabilities are found to be 

13 ! 17 ! 12 ! 18 ! _1 

30 ! “ • (12 ~ .rj ! (1 + 4 ! ^ ! (17 x) V 

Putting o, 1 , 2 , . . . the probabilities are therefore 

13 ! 18 ! f 12.17 12 . 11 . 17 . 16 1 

30 ! I ’ 2 * 2 13 ! > • • • / 






26.50 

50 Fisher —The Logic of Inductive Inference, 

The significance of the observed departure from proportionality is 
therefore exactly tested by observing that a discrepancy from pro> 
portionality as great or greater than that observed, will arise, subject 
to the conditions specified by the ancillary information, in exactly 
3f095 trials out of 6,653,325, or approximately once in 2,150 trials. 
The test of significance is therefore direct, and exact for small samples. 
No process of estimation is involved 

The use of the margins as ancillary information suggests a more 
general treatment. Had the hypothesis we wish to examine made 
the chances of criminality different for monozygotic and dizygotic 
twins, e.g. p in one case and p' in the other, the probability of 
observing any particular value of x would have included an additional 
factor 



the frequency distribution is determined by the parameter 0, and 
for each value of 0 we can make a test of significance by calculating 
the probability, 

(1 H- 1020 4 - 29920 »)/(l 4- 1020 4 - ... 4 - 4760 «), 

the ratio of the partial sum of the hypergeometric series to the 
hypergeometric function formed by the entire series. This prob¬ 
ability rises uniformly as 0 is diminished, and reaches i per cent., 
when 0 is just less than 0*48. We may thus infer that the observations 
differ significantly, at the 1 per cent, level of significance, from any 
hypothesis which makes 0 greater than 0-4798. That is to say, that 
any hypothesis, which is not contradicted by the data at this level 
of significance, must make the ratio of criminals to non-criminals at 
least 2-084 times as high among the monozygotic as among the 
dizygotic cases. 

Similarly, the probability rises to 5 per cent, when 0 = *28496, so 
that any hypothesis which is not contradicted by the data at the 
5 per cent, level of significance must make the ratio of criminals to 
non-criminals at least three and a half times as high among the 
monozygotic as among the dizygotic. 

This is not a probability statement about 0. It is a formally 
precise statement of the results of applying tests of significance. If, 
however, the data had been continuous in distribution, on the hypo¬ 
thesis considered, it would have beep equivalent to the statement 
that the fiducial probability that 0 exceeds 0*4798 is just one chance 
in a hundred. With discontinuous data, however, the fiducial 



26.51 

51 


Fisher —The Logic of Inductive Inference, 

argument only leads to the restdt that this probability does not 
exceed o*oi. We have a statement of inequality, and not one of 
equality. It is not obvious, in such cases, that, of the two forms 
of statement possible, the one explicitly framed in terms of prob¬ 
ability has any practical advantage. The reason why the fiducial 
statement loses its precision with discontinuous data is that the 
frequencies in our table make no distinction between a case in which 
the 2 dizygotic convicts were only just convicted, perhaps on venial 
charges, or as first offenders, while the remaining 15 had characters 
above suspicion, and an equally possible case in which the 2 convicts 
were hardened offenders, and some at least of the remaining 15 had 
barely escaped conviction. If we knew where we stood in the range 
of possibilities represented by these two examples, and had similar 
information with respect to the monozygotic twins, the fiducial 
statements derivable from the data would regain their exactitude. 
One possible device for circumventing this difficulty is set out in 
^Example 2 . It is to be noticed that in this example of the fourfold 
table the notion of ancillary information has been illustrated solely 
in relation to tests of significance and fiducial probability. No 
problem of estimation arises. If we want an estimate of we have 
no choice but to take the actual ratio of the products of the fre¬ 
quencies observed in opposite corners of the table. 

Example 2 . 

On turning a discontinuous distribution, leading to statements of 
fiducial inequality, into a continuous distribution, capable of yielding 
exact fiducial statements, by means of a modification of experimental 
procedure. 

Consider the process of estimating the density of micro-organisms 
in a fluid, by detecting their presence or absence in samples taken at 
different dilutions. A series of dilutions is made up containing 
densities of organisms decreasing in geometric progression, the 
ratios most commonly used being tenfold and twofold. We will 
suppose, to simplify the reasoning, that the series is effectively 
infinite, in the sense that it shall be scarcely possible for the organism 
to fail to appear in the highest concentration examined, or for it to 
appear in the highest dilution. A »'umber, s, of independent samples 
are examined at each dilution. The dilution ratio we shall call a, 
and we shall suppose the dilutions to be numbered consecutively, 
with the number n increasing as dilution is increased. 

If p is the density of the organisms to be estimated, then the 
density in the nth dilution, reckoned on the size of the sample taken, 
is 

m = pa^. 



26.52 

52 Fisher —The Logic of Indtictvve Ivferetice. 

The chance of a sterile sample is, therefore, 

'p e-”*. 

The probability of scouring t sterile and u fertile cultures at this 
dilution will thereft>re be 

and the probability of a complete series of observations specified by 
t,t and u,t at each dilution will be 

which, regarded as a function of p, gives the likelihood of any 
particular value of the unknown density. 

The form of the likelihood function, and therefore the amount of 
information supplied by a series of observations, depends very 
greatly on the distribution of the numbers of sterile and fertile 
samples in that part of the range of dilutions in which both occur. 
Thus, if there were three samples at each dilution, an experiment in 
which all were fertile before the wth dilution, and all of the nth and 
higher dilutions were sterile, would give a higher precision to the 
estimate than if there were one sterile at the (n — l)th dilution, and 
one fertile at the nth. Consequently, it would be advantageous, if 
possible, to take account of the configuration of the observed series, 
that is, of the succession of numbers of sterile samples from the first 
observed, irrespective of the particular dilution in which this appears, 
as information ancillary tt) the interpretation of our estimate*, which 
itself must depend greatly on where the series starts. 

The objection to doing this is that, for a given scries of dilutions, 
the frequency with which any particular configuration appears will 
not be entirely independent of p, but will be a periodic function of 
log p, since it evidently does not change when log p is increased or 
diminished by a multiple of log a. In order to make these frequencies 
entirely independent of p it is, however, sufficient that the jiarticular 
scries of dilutions used should them.selves be chosen at random by a 
process equivalent to the following :—A number, 0, is chosen at 
random between o and i. In the first dilution, instead of the 
dilution ratio a wc use the dilution ratio a®, using the dilution ratio a 
for all subsequent dilutions. The probability of any particular con¬ 
figuration occurring is now wholly independent of p, and, for any 
configuration the probability of the first sterile sample being drawn 
from the dilution :— 


n -f- 0 = a; 



26.53 


Fishkr —The Togic of Indtictive Inference. 53 

will be a continuous function of the variate 

log p “ - X log a, 

whurh ran be completely calculatccl fr<»m tlie c<>nfiguration. (Vm- 
scqucntly, fiducial limits of any chosen probabilit}' could be calculated 
for p, merely by observing at W'hat dilution the first sterile sample 
occurs. For any chosen values of a and to be used in such tests, 
the fiducial limits of the commoner configurat ions could be listed in 
advance, so reducing the calculal ion to little more than looking up an 
anti-logarithm. The artifice of varying the initial dilution in 
accordance with a number chosen at random ft)r eacrh series thus 
obviates the need for expressing our conclusions as to the fidu<‘ial 
probability of any proposed density in the fc»rm of an inequality”. 

If we are satisfit'd of the logical sountlne.ss of the c'riteria developed, 
we are in a position to apply thc*m to test the claim that mathematical 
likelihood supplies, in the logical situation ]>revailing in problems of 
estimation, a measure of rational Vielief analogous to, though mathe¬ 
matically different from, that sup]>lied by mathematical probability 
in tlmse problems of uncertain <leductiv'e inference for which the 
theory of probability was <l<*v<*loped This c>laim may be sub¬ 
stantiated by two fa<ts. First, that the particular njethod of 
estimation, arrived at by choosing those values of the parameters the 
likelihood of which is greate.st, is found to elicit not less information 
than any otlu^r metho<l which can be adopted. Secondly, the 
residual information sup[)lied by the sample, which is not included 
in a mere statement of the iiarametric values which maximize the 
likelihood, can be obtained fr<im other characteristics of the likelihood 
function; such as, if it is differentiable, its second and higher deriva- 
tiv'cs at the maximum. Thus, basing our theory entirely on con- 
siderati«)n8 indej^endent of the pos.sible relevance of mathematical 
likelihood to inductive inferences in problems of estimation, we seem 
inevitably h^d to recognize in this quantity the medium by which all 
such information as we possess may be apjuopriately conveyed. 

To those who wi.sh to «*x})lore for themselves how far the ideas so 
far developed on this sul»ject will carry us, two types of problem may 
be suggested. First, how to utilize the whole of the information 
available in the likelihood function. Only two classe.s of cases have 
yet been solved, (u) Sulficient statistics, where the whole course 
of the function is determined by the value w'liich maximizes it, and 
where consequently all the available information is contained in the 
maximum likclihoo<l estimate, without the need of ancillary statistics. 
(^>) In a second case, also of eomiiion o<*currenee, where there is no 
sutiicient estimate, the whole of the ancillary information may be 
recognized in a set of simple relationships among the sample values. 



26.54 

64 Fxshxb —The Logic of InduGtive Irference. 

whicli I have called the configuration of the sample. With these 
two special cases as guides the treatment of the general problem might 
be judged, as far as one can judge of these things, to be ripe for 
solution. 

Problems of the second class concern simultaneous estimation, 
and seem to me to turn on how we should classify and recognize the 
various special relationships which may exist among parameters 
estimated simultaneously. For example, it is easy to show that two 
parameters may be capable of sufficient estimation jointly, but not 
severaUy, because each estimate contributes the ancillary informa¬ 
tion necessary to complete the other. 

In considering the future progress of the subject it may be 
necessary to underline certain distinctions between inductive and 
deductive reasoning which, if unrecognized, might prove serious 
obstacles to pure mathematicians trained only in deductive methods, 
who may be attracted by the novelty and diversity of our subject. 

In deductive reasoning all knowledge obtainable is already latent 
in the postulates. Rigour is needed to prevent the successive 
inferences growing less and less accurate as we proceed. The con¬ 
clusions are never more accurate than the data. In inductive 
reasoning we are performing part of the process by which new 
knowledge is created. The conclusions normally grow more and more 
accurate as more data are included. It should never be true, though 
it is still often said, that the conclusions are no more accurate than 
the data on which they are based. Statistical data are always 
erroneous, in greater or less degree. The study of inductive reasoning 
is the study of the embryology of knowledge, of the processes by 
means of which truth is extracted from its native ore in which it is 
fused with much error. 

Secondly, rigour, as understood in deductive mathematics, is not 
enough. In deductive reasoning, conclusions based on any chosen 
few of the postulates accepted need only mathematical rigour to 
guarantee their truth. All statisticians know that data are falsified 
if only a selected part is used. Inductive reasoning cannot aim at a 
truth that is less than the whole truth. Our conclusions must be 
warranted by the whole of the data, since less than the whole may be 
to any degree misleading. This, of course, is no reason against the 
use of absolutely precise forms of statement when these are available. 
It is only a warning to those who may be tempted to think that the 
particular precise code of mathematical statements in which they 
have been drilled at College is a substitute for the use of reasoning 
powers, which mankind has probably possessed since prehistoric 
times, and in which, as the history of the theory of probability shows, 
the process of codification is still incomplete. 





27 

UNCERTAIN INFERENCE 


AUTHOR’S NOTE 

This paper, the opening lecture of the Harvard Tercentenary Con¬ 
ference, is an outline of the history leading to recent developments 
in the logic of inductive reasoning. The role of mathematics in this 
field has scarcely been appreciated by pure logicians, whose formula¬ 
tions take no account of the functional form of the problem. 

The author submits that the existence or non-existence of solu¬ 
tions, or, in general, the conditions of solubility, of the problem of the 
Nile stated on the last page, must supply the key to the nature of 
the inductive inferences possible in each type of problem. 


* Reprinted from Proceedings of the American Academy of Arts and Sciences^ 
Vol. 71, No. 4, pp. 245-258, 1936. 



27.245 


UNCERTAIN INFERENCE* 
By Ronald Aylmer Fisher 


At a Tercentenary Celebration we shall do well to look both to the 
past and to the future. In undertaking to address a mathematical 
audience, at the present time, on the subject of Uncertain Inference 
my chief care will naturally be to set forth, at least in outline, those 
very recent advances which have resolved effectively and conclusively 
the doubts, confusions, and ambiguities which we can now see clouded 
the views, and arreste<l the progress, of those great predecessors to 
whom our subject owes its gradual development. But just as, behind 
the Harvard of to-day, the fully developed alma mater where future 
generations of Americans will train their minds, and form their 
characters, we perceive the struggling college of the seventeenth 
century, without which this other could not have been what it is; 
so we can only gain a just perspective of my present topic by recalling 
the steps, some hesitating, some even false, by which men have 
come gradually to understand how their reason may be applied 
to uncertainties, yet applied with logical rigour, and how, in par¬ 
ticular, it may be applied to observational facts with all their limita¬ 
tions, their paucity in number and their imperfect precision, and 
yet draw from them precisely those inferences which the observations 
warrant. 

The first great step was the development of the concept of mathe¬ 
matical probability. Much as this word has since been misapplied, 
to the writers of the seventeenth and eighteenth centuries its meaning 
was plain and unequivocal. For centuries, no doubt, expectations had 
been deemed capable of evaluation. Expectations under wills, and 
expectations from uncompleted trading ventures, had been bought 
and sold. In games of chance such expectations seemed capable of 
rigorous calculation. The structure of the game, and its condition 
when broken off, made it possible to assign to each player a calculable 
fraction of the amount at stake. This fraction, the ratio of the ex¬ 
pectation to the prize which might be won, supplied the essentially 
new concept of probability. To Thomas Bayes, indeed, this was its 
definition. 

The idea of probability seems to have been an essentially new one 
in mathematical thought. So far as we know it was unknown to the 



27.246 

246 FISHER 

Greek and to the Islamic mathematicians. It was a concept aui 
generis, rather like the notion of temperature in Physics, and it was 
novel particularly in this, that it brought uncertain consequences 
within the domain of exact or rigorous thought. If the apparatus 
used in gambling were true or unbiassed, and were fairly used, the 
probabilities of the game could be calculated with exactitude. From 
this point in time there was no excuse for mathematicians to confuse 
rigour with certainty. In the discussions on probability the un¬ 
certainty remained an integral part of the situation, but the concept 
of probability allowed the nature and extent of this uncertainty to be 
specified with rigour. 

The possibilities of this situation were, of course, only slowly ap¬ 
preciated. Not until our own days has it been realized that the fact 
that some uncertain inferences are rigorously expressible in terms of 
probability does not imply that the same concept is capable of pro¬ 
viding an exact specification of the nature of uncertainty in all cases. 
We are now, indeed, familiar with logical situations of a different 
type which require to be specified in terms of mathematical likelihood; 
and there is, as yet, no assurance that even probability and likelihood 
together will suffice for the specification of every kind of logical un¬ 
certainty which may be profitably discussed. 

For centuries, however, it was assumed that if uncertain inferences 
were to be made they must be made in terms of mathematical proba¬ 
bility. It was, I believe, this assumption, more than any other factor, 
which has led to efforts to define probability in more general, and 
usually in psychological, terms, and has introduced infinite confusion 
into the use of this once well defined concept. 

Thomas Bayes’ paper of 1763 was the first attempt known to us to 
rationalize the process of inductive reasoning. PVom time immemorial, 
of course, men had reasoned inductively; sometimes, no doubt, well, 
and sometimes badly, but the uncertainty of all such inferences from 
the particular to the general had seemed to cast a logical doubt on the 
whole process. By the middle of the eighteenth century, however, 
expierimental science had taken its first strides, and all the learned 
world was conscious of the effort to enlarge knowledge by experiment, 
or by carefully planned observation. To such an age the limitations 
of a purely deductive logic were intolerable. Yet it seemed that 
mathematicians were willing to admit the cogency only of purely 
deductive reasoning. From an exact hypothesis, well defined in 
every detail, they were prepared to reason with precision as to its 
various particular consequences. But, faced with a finite, though 



27.247 

UNCERTAIN INFERENCE 247 

representative, sample of observations, they could make no rigorous 
statements about the {population from which the sample had been 
drawn. 

Bayes perceived the fundamental imp>ortance of this problem and 
framed an axiom, which, if its truth were granted, would suffice to 
bring this large class of inductive inferences within the domain of the 
theory of probability; so that, after a sample had been observed, 
statements about the population could be made, uncertain inferences, 
indeed, but having the well-defined type of uncertainty characteristic 
of statements of probability. Bayes' technique in this feat is ingenious. 
His predecessors had supplied adequate methods, given a well-defined 
{population, for stating the probability that any {particular type of 
sample might result. His problem was: given a particular kind of 
sample, to state with what probability a particular type of {population 
might have given rise to it. He imagines, in effect, that the possible 
ty{pes of {Population have themselves been drawn, as samples, from a 
super-{Population, and his axiom defines this super-population with 
exactitude. His problem thus becomes a purely deductive one to 
which familiar methods were applicable. 

There is one point for which Bayes is seldom given enough credit. 
He had doubts as to the necessary truth of his axiom. So serious were 
these doubts that he withheld his entire treatise from publication 
until they should be resolved; and it appears that they never were 
resolved, for his paper was published by his friends after his death. 

That Bayes’ axiom was designed to meet a real need is shown by 
the eagerness and rapidity with which his work became the common 
property of European mathematicians. Laplace, in particular, in- 
cor{porated it into the foundations of his “Th^orie Analytique des 
Probability,” cruelly twisting the definition of probability itself in 
order to accommodate the doubtful axiom. It is certain that Laplace 
had no appreciation of Bayes’ scientific caution. He says of Bayes, 
“ Et il y est parvenu d’une manifere fine et trhs ingenieuse, quoi qu’un 
peu embarrassy.” 

Substantial errors are so rare in the history of mathematics that 
mathematicians are remarkably unsuspicious of the work of their 
greater predecessors. The illustrious authority of Laplace thus ex¬ 
plains in some sort why Bayes’ doctrine in its new dress was embodied 
without a query into the mathematical teaching of full two generations. 
To practical thinkers it seemed to meet a practical need. To mathe¬ 
maticians it appeared robed in the authority and in the analytic 
elegance of Laplace’s ”Th 4 orie.” To De Morgan in 1838 it was still 



27.248 

248 FISHER 

unquestioned gospel, and one of the great steps forward in the history 
of his subject. 

The first serious criticism was <leveloped by Boole in his “ l>aws of 
Thought*^ in 1854 . In that extraordinary work Boole anticipated 
many subsequent attempts to develop a symbolical logic, with par¬ 
ticular reference to problems in probability. He recognizes the 
contradictions and inherent arbitrariness of Bayes* axiom, as de¬ 
veloped by Laplace, and quite properly treats it as an attempt to 
supply by hypothesis something which the data themselve.. lack. 

He writes: “These results only illustrate the fact, that when the 
defect of <iata is suppliwl by hypothesis, the solution will, in general, 
vary with the nature of the hypotheses assumed; so that the question 
still remains, only more definite in form, whether the principles of the 
theory of probabilities serve to guide us in the election of such hypo¬ 
theses. I have already expre.s.sed my conviction that they do not.** 
Boole gives fresh reasons and a<lds: -“Still, it is with diffidence that I 
express my dissent on the^se points from mathematicians generally, and 
more especially from one who, of English writers, has most fully entered 
into the spirit and the methods of I^place; and I venture to hope 
that a question, second to none other in the theory of probabilities in 
importance, will receive the careful attention which it deserves.*’ 

Boole*s criticism worked its effect only slowly. In the latter half of 
the nineteenth century the thc^ory of inverse probability was rejected 
more decisively by Venn and by Chrystal, but so retentive is the 
tradition of mathematical teaching that I may myself say that I 
learned it at school as an integral part of the subject, and for some 
years saw no reason to question its validity. Mathematicians were 
averse from abandoning a theory, which often led t(> plausible con¬ 
clusions, and, abov'e all, which they had nothing to replace. Its 
acceptance as ortho<lox effectively concealed from the majority the 
fact that, not a mere restatement in more accurate terms, but a 
fundamentally new approach, was required. As late as 1908 we find 
Edgeworth, vague but definitely defen.sive: “I submit that very 
generally we are justified in assuming an equal di.stribution of a priori 
probabilities over that tract of the measurable with which we are 
here concerned.** 

Why should a mathematician <lefend a procedure for which he can 
say no more than that? And why, to take another example, should 
Karl Pearson, a few years lat€*r ( 1920 ) put forward what he, and I 
believe he alone, regarded as a proof of the disputed axiom. Such 
stubborn unwillingness to abandon a false position, to admit ignorance, 



UNCERTAIN INFERENCE 


240 


and to start again, can only be due to mathematicians having so 
seldom experience of situations which call for an orderly retreat! 

The need for an exact procedure of inductive inference was essen¬ 
tially a practical one, and the means for meeting it were being prepared 
by mathematicians having practical interests beyond those discussed 
by specialists in the academic theory of probability. Let us turn to 
Gauss and the foundations of the theory of estimation . As is well 
known. Gauss, at one time, developed his method of least squares by 
a formulation identical with that now used in the method of maximum 
likelihood, but which he justified as taking for the estimate the value 
of the unknown which had the highest probability. That would be, 
of course, the mode of its frequency distribution, if any such distri¬ 
bution could be assigned to it. Later, as he explained in a letter to 
Bessel, he let this argument fall into the background, through the 
conviction that maximizing the probability was less important than 
minimizing the injurious effects of the actual errors of estimation. To 
measure these injurious eflFects by the square of the error he regarded 
as arbitrary, though convenient. 

Modern research has reconciled the two aims discussed by Gauss. 
If, for any frequency distribution of a variable x, 

df =* y{x)dx, 

where the frequency density y depends on some unknown parameter 
0 , we calculate 

/KS)’-' 

over all possible values of x; then this quantity is invariant for transfor¬ 
mations of X, and measures the amount of information which a single 
observation x contains respecting 6. If x is itself an estimate of 0 de¬ 
rived from a sample, the expression measures the intrinsic accuracy of 
an estimate having the sampling distribution given. For the particular 
and important case of the normal or Gaussian distribution, the 
intrinsic accuracy is the invariance, or the reciprocal of the mean 
square. Error curves of forms other than the Gaussian can then be 
compared in their precision. When this is done it appears that the 
estimate obtained by maximizing the likelihood is in general the one 
for which the intrinsic accuracy is greatest. 

A knowledge of the likelihood function thus takes the place of 
knowledge of a probability distribution in that type of uncertain 
inference with which the theory of estimation is concerned. This 



27.250 

250 FISHEB 

logical situation is one of wide occurence in the discussion of scientific 
theories of all kinds. It presupposes a hypothesis containing one or 
more arbitrary parameters. The hypothesis is capable of specifying 
the probability or frequency of occurrence, of each of the observational 
facts which can be distinguished. The probabilities of the observable 
occurrences are then functions of the parameters, and functions of 
known mathematical form. Only the values of the parameters are 
unknown. The theory of estimation discusses the advantages of the 
different methods by which these values can be estimated from an 
observational record. Clearly, there can be no operation properly 
termed “estimation,*' until the parameter to be estimated has been 
well defined, and this requires that the mathematical form of the 
distribution shall be given. Nevertheless, we need not close our eyes 
to the possibility that an even wider type of inductive argument may 
some day be developed, which shall discuss methods of assigning from 
the data the functional form of the population. At present it is only 
important to make clear that no such theory has been established. 

The direct assessment of the amount of information supplied by a 
body of data, the sample of observations, and by a parallel and 
independent process, of the amount of information extracted from the 
data, and contained in the estimate, brings to light the important 
fact that in some special, but specially important cases, these amounts 
are equal. The estimate exhausts the whole value of the data; once 
the estimate has been calculated, the remaining facts which the data 
provide are entirely irrelevant to the value of the unknown parameter. 
Their distributions are, in fact, independent of the value of this 
parameter, so that we have the enlightening situation, of which the 
arithmetic mean of a normal sample, or of a sample from a Poisson 
Series, are examples, in which, given the value of the first, or Sufficient 
estimate, the sampling distribution of any alternative estimate is 
independent of the quantity of which it is designed to indicate the 
value. All such alternative estimates are therefore worthless. The 
existence of Sufficient statistics, in the sense defined above, is not 
only of theoretical interest as a possibility, but of great practical 
importance, for the cases in which they exist cover many of the forms 
most used by statisticians in practice. 

Theoretically, however, the existence of sufficient statistics is ex¬ 
ceptional, dependent as it is from a special functional relationship. 
When no sufficient statistic exists then no single estimate can contain 
the whole of the information supplie<l by the sample. There appears 
to be an inevitable loss, and, in these cases, the method of maximum 



27.251 


UNCERTAIN INFERENCE 


251 


likelihood is only preeminent in making this loss as small as is possible. 
The next task of the theory is to trace the cause of this loss, and to 
discover in what way it may be made good. 

Before turning to this fascinating enquiry, we must recall another 
development of modern mathematical statistics, in which again the 
practical requirements of research have moulded the mathematical 
» structure. I refer to the establishment of exact tests of significance . 
These are now somewhat numerous, and of many kinds, designed to 
cover the various cases which commonly arise in practice. They are 
all of quite recent origin, and I may take as typical the test of signifi¬ 
cance of the mean of a normal sample. This was published in 1908 , 
which year, you may notice, is the same from which I have quoted 
Edgeworth’s defense of inverse probability. Its author was a young 
man, then unknown, who chose to publish under the now celebrated 
pseudonym of “ Student.” 

The classical procedure, dating at least from the time of Gauss, for 
testing the significance of the difference between the observed mean of 
a normal sample, and zero, or any other value chosen for comparison, 
is to divide the difference by its standard error, as estimatcid from the 
sample. If x is the observed mean of n observations, and jjl the true 
mean of the population from which the sample was drawn, then it 
has long been known that x is distributcjd in different samples in a 
normal distribution, with its centre at ji, and having a variance one 
nth of that of the population sampled. If, therefore, we knew the 
true standard deviation, a, of this j)opulation, we should know that 

(x — (x) vn 

(J 

was distributed normally with unit variance, ancl so could assign with 
exactitude the probability with which any chosen value would be 
exceeded. In fact, the true value, a, is not known, but we have in its 
place an entirely satisfactory estimate, s, defined by, 

x* = ^ >S’(x — x)*, 

n — 1 

where S stands for summation over the sample. This estimate is, in 
fact, a sufficient one; but it is, none the less, a fact that the value of a 
arrived at will usually differ more or less from the true value, a. 
Consequently, if we substitute a for a, and calculate 

(x — [k) y/n 

9 

a 


t 



252 


FISHER 


27.252 


we are not justified in asserting; that i will be distributc^d in the normal 
distribution. The originality of “Student’s’* approach lay in en¬ 
q uiring how in fact the ratio t is distributed , when calculated from 
samples of n observations. The exact solution is found to be given by 
the frequency element. 


df 


— 2 , 


VxCn — 1) 


dt 




a distribution very different in mathematical character from the 
Gaussian, though progressively approaching this form as n is in¬ 
definitely increased. The distribut ion is^ ht>wtn'er, exa^tj, and capable 
of ^abidation f(^ ^ch size of ^mple possilde. It has, indeed, at 
various times been rather thoroughly tabulate<l. Consequently, in 
place of asserting that there is a probability of one chance in forty 
that 


(X — tx) ^ 
a 


1.9G0, 


an assertion which would only be directly useful if a wi^re known with 
exactitude, it is equally open to us, if, for example, our mean were 
based on fifteen observations, to assert that 

t = ~ 

ft 


has a probability of one in forty of exceeding the value 2.145. This 
statement is directly useful, for s is not unknown, but is calculable 
with exactitude from the observations. 

Armetl with this new tool, it was natural for practical experimenters 
to take a further logical step of great theoretical importance, namely 
to use the ratio, e.g., 2.145, appropriate to the level of significance 
chosen, to multiply this by the standard error of the mean off vfitimatcd, 
to add or substract the product to or from the observed mean, and so 
to obtain working limits for the values of the unknown mean of the 
population. 

In fact, .since the distribution of i is known with exactitude, and 
since t is given by the formula 

t = ( 9 -) 

s 



27.253 

UNCERTAIN INFERENCE 253 

which involves, apart from (x, directly calculable quantities only, 
namely x and s, both of which are sufficient statistics, we may infer, 
without any use of probabilities a priori, a frequency distribution for 
(X which shall correspond with the aggregate of all such statements as 
that made above, to the effect that the probability that (x is less than 
X — 2.145 s/ y/n is exactly one in forty. 

It is, at first sight, easy to confuse probability statements respecting 
unknown parameters, derived by arguments similar to the above, with 
statements of inverse probability. Indeed, attempts have been made 
to use these arguments, by identifying the results to which they lead 
with statements of inverse probability, as a means of ascertaining 
which particular hypothesis of probabilitie.s a priori should be adopted 
in order to lead to equivalent conclusions. In reality the statements 
with which we are concernc^d differ materially in logical content from 
inverse probability statements, and it is to distinguish them from these 
that we speak of the distribution derived as a fiducial frequency 
distribution, and of the working limits, at any required level of 
significance, that may be derived from it as the fiducial limits at this 
level. This distinctive terminology is not intended to suggest that 
fiducial probability is not in the strictest sense a mathematical proba¬ 
bility, like any other to which the term ought to be applied, but that 
it has been derived by a form of argument very different from that 
introduced by Bayes, and one which was unknown to all the early 
writers on the theory of probability. 

It is a matter of some historical interest to examine why a mode of 
reasoning so essentially simple, and so cogent, as that outlined above, 
should have escaped the penetration of the early writers, who include 
some of the most illustrious of mathematicians. There are two 
circumstances which may help to make clear this difficulty. The 
d istributions studie d by early wr iters were nearly all dis cont inuous 
djstnbution s, distiibuSons in partic ular, of which the yariates are 
f requencie s. When applied to these the fiducial type of argument 
does not lead us to an exact frequency distribution of the unknown 
parameters, but only to a series of inequalities which add little in 
intelligibility to the tests of significance from which they may be 
derived. The neglect of the frequency distributions of continuous 
variates, until they were forced on the notice of mathematicians by 
the requirements of the quantitative sciences, is, I believe, one potent 
reason why early writers on probability were not led to use arguments 
of the fiducial type. For such arguments to be fruitful, moreover, the 
distributions considered must be not only continuous , but mathe- 




27.254 

254 FISHER 

maticaliy exa ct. Exact solutions of all the more important and 
immediate problems were possible by analytic methods certainly 
within the capacity of the greater writers of the last 150 years. That 
their existence remained for so long unknown, can only, I believe, be 
explained by the absence of any steady conviction that inferences 
involving an element of uncertainty deserve anything better than 
rough and approximate discussion. 

Two subsidiary circumstances, also, have in our own time greatly 
facilitated the new approach, and have, indeed, made its development 
inevitable. One is the convenient practice of tabulating the distri¬ 
butions required, at a series of definite levels of significance, i.e., of 
expressing the variate in terms of the probability, in place of regarding 
the probability as a function of the variate. The second circumstance 
is the abandonment of the inverse type of argument, since, so long as 
statements of inverse probability were held to be the aim, the possi¬ 
bility of making inferences of fiducial probability, which differ from 
the former in logical content, was very naturally overlc>oked. 

There is one peculiarity of uncertain inference which often presents 
a difficulty to mathematicians trained only in the technique of rigorous 
deductive argument, namely, that our conclusions are arbitrary, and 
therefore invalid, unless all the data, exhaustively, are taken into 
account. In rigorous deductive reasoning wo may make any selection 
from the data, and any certain conclusions which may be deduced from 
this selection will be valid, whatever additional data we may have at 
our disposal. The more philosophic writers on probability, however, 
such as Venn, have emphasized the fact that conclusions in this field 
are relative, not only to what is known, but also to what is undeter¬ 
mined. Venn, for example, contrasts the conclusions to be drawn 
from such items of information as that the death-rate of Englishmen 
is higher in Madeira than in England, and that the death rate of 
tuberculous patients is higher in England than in Madeira. The 
probable effect of a change of residence is different for the contrasted 
cases of a man chosen at random from the English population, as 
against one chosen at random from the tuberculous patients of 
that country. The additional datum that the individual chosen is 
tuberculous must not be ignored in drawing inferences from the re¬ 
maining data. 

This peculiarity appears to be characteristic of uncertain inference 
in general. It is certainly as important in inductive reasoning from 
observational data, as in the purely deductive inferences of the classical 
theory of probability. Every statistician is conscious that if he were 



27.255 

UNCERTAIN INFERENCE 265 

to allow himself to make an arbitrary selection among the observa¬ 
tional material available, then the most orthodox operations of his 
craft could be made to lead to almost any desired conclusion. The 
political principle that ‘‘Anything can be proved by statistics” thus 
enshrines a subtle truth, which requires to be the more carefully borne 
in mind, the more we rely on mathematical techniques developed with 
only certain inferences in view. 

This consideration is vital to the fiducial type of argument, which 
purports to infer exact statements of the probabilities that unknown 
hypothetical quantities, or that future observations, shall lie within 
assigned limits, on the basis of a body of observational experience. 
No such process couUl be justified unless the relevant information 
latent in this experience were exhaustively mobilized and incorpo¬ 
rated in our inference. 

We may now appreciate the necessity of the condition I mentioned, 
in connection with “Student’s” test of significance, for the mean of a 
normal sample; namely, that the quantities x and which, together 
with the unknown parameter appear in the expression for f, should 
be Sufficient estimates of the mean and standard deviation of the 
population sampled. For this is a guarantee that they have, together, 
tapped all the information the sample has to give respecting the nature 
of the population. If alternative estimates had been used; if, for 
example, we had found the median in place of the arithmetic mean, 
X, or, if we had used Peter’s Formula, based on the mean deviation, 
in place of Bessel’s formula, based on the mean equare, we might have 
derived an entirely valid test of significance; that is to say we could 
have found a quantity with a distribution exactly known for 
samples of a given size, and expressible, like f, in terms of the un¬ 
known parameter, together with directly calculable quantities only. 
But, if we had gone further, and, substituting for V in terms of yi, 
had derived a fiducial distribution of the unknown parameter, the 
distribution we should obtain would be based only on that part of the 
information available, which our special estimates of the mean and 
standard deviation had conserved. The distribution obtained would 
differ from that found by using the sufficient estimates, and the 
probability statements which it embodies would be discrepant. With¬ 
out the requirement that the information available should be ex¬ 
hauster!, a host of discrepant inferences would appear ecjually admis¬ 
sible, each dependent from the personal choice of the statistician, 
through his choice of the method of estimation to be employed. 

When Sufficient estimation is possible, there is here no problem; 



27.256 

256 FISHER 

but the exhaustive treatment of the cases in which no Sufficient 
estimate exists is now seen to be an urgent requirement. This at 
present is in the interesting stage of being possible sometimes, though, 
so far as we know, not always. I have spoken of the Sufficient esti¬ 
mates as containing in themselves the whole of the information pro¬ 
vided by the data. This is not strictly accurate. There is always 
one piece of additional, or ancillary, information which we require, 
in conjunction with even a Sufficient estimate, before this can be 
utilized. That piece of information is the size of the sample; or, in 
general, the extent of the observational recrord. We always need to 
know this in order to know how rc'liable our estimate is. Instead of 
taking the size of the sample for granted, and saying that the peculi¬ 
arity of the cases where sufficient estimation is possible lies in the 
fact that the estimate then contains all the further information re- 
quircnl, we might equally well have inverted our statement; and, 
taking the estimate of maximum likelihoo<l for granted, have said 
that the peculiarity of these cases was that, in addition, nothing more 
than the size of the sample was needed for its complete interpretation. 
This reversed aspect of the problem is the more fruitful of the two, 
once we have satisfied ourselves that, when information is lost, this 
loss is minimized by using the estimate of maximum likelihood. The 
cases in which Sufficient estimation is impossible are those in which, 
in utilizing this estimate, other ancillary information is required from 
the sample beyond the mere number of observations which compose 
it. The function which this ancillary information is required to per¬ 
form is to distinguish among samples of the same size those from which 
more or less accurate estimates can be made; or, in general, to dis¬ 
tinguish among samples having <lifferent likelihood functions, even 
though they may be maximized at the same value. Ancillary in¬ 
formation never modifies the value of our estimate; it determines its 
precision. 

The procedure of this kind the most general possible would be, from 
a sample of n observations, to specify (a) the estimate, or set of 
estimates of the unknown parameters, having the greatest likelihood; 
and (b) a set of functionally independent ancillary statistics, sufficient 
in conjunction with (a) to allow the observations to be reconstructed 
in their entirety, and having the additional property that these an¬ 
cillary quantities shall be all distributed in samples in distributions 
independent of the unknown parameters. It is easy to see that this 
can be done in certain simple cases. For example, if [l is the only un¬ 
known parameter in a frequency distribution specified by the differ¬ 
ential element 



27.257 

UNCERTAIN INFERENCE 257 

d/ - <|>(* — 

then the differences between successive observations, when these are 
arranged in order of magnitude, supply n ~ 1 functionally inde¬ 
pendent quantities, calculable from the sample, the sampling distri¬ 
bution of each of which is evidently independent of [l. We may, 
therefore, regard such a set of differences as specifying the configura¬ 
tion of the sample, and, in interpreting our estimate, may take as its 
sampling distribution that appropriate to only those samples which 
have the actual configuration observed. 

Here, then, is a second group of solutions, by which estimation may 
be made exhaustive, like the Sufficient statistics in depending from a 
special functional relationship, like them, also, in resolving a wide 
class of the problems arising in practice. And my final word on this 
topic is a query, the answer to which so far is unknown, and which is, 
therefore, at present a challenge to our mathematical intuition. 
May I put the problem in this form?:— 

The agricultural land of a pre-dynastic Egyptian village is of 
unequal fertility. Given the height to which the Nile will rise, the 
fertility of every portion of it is known with exactitude, but the 
height of the flood affects different parts of the territory unequally. 
It is required to divide the area, between the several households of 
the village, so that the yields of the lots assigned to each shall be in 
pre-determined proportion, whatever may be the height to which 
the river rises. 

If this problem is capable of a general solution, then it is possible in 
general to recognize something corresponding with the configuration 
of the sample in the simple case discussed above, and one of the 
primary problems of uncertain inference will have reached its complete 
solution. If not, there must remain some further puzzles to unravel. 



258 


FISHER 


27.258 


REFERKNCES 

Bates, T. 

1763. An Essay towards solving a problem in the Doctrine of chances. 
Phil. Trans. 53, p. 370. 

Boole, G. 

1854. I^aws of though;., Cambridge, p. 375. 

De Morgan, A. 

1838. An Essay on Probabilities and on their application to life con¬ 

tingencies and Insurance Offices. Preface vi. 

Edgeworth, F. L. 

1908. On the probable errors of frequency constants J.R.S.S. 71, p. 387. 
Fisher, R. A. 

1925. Theory of Statistical Estimation. Proc. Camb. Phil. Soc. 22, 
pp. 700-725. 

1933. Two new properties of mathematical likelihood. Proc. Roy. Soc. 
A, 144, pp. 285-307. 

Gauss, K. F. 

1809. Thcoria Motus Corporum Coelestium. Hamburg, p. 210. 

1839. Briefwechsel zwischen Gauss und Bessel. JA*ipzig (1880). 
Laplace, P. S. 

1814. Thcorie analytique des probability. Paris. Second Ekiition, 
p. ciii. 

Pearson, K. 

1920. The fundamental problem of practical statistics. Biometrika 
xiii. pp. 1-16. 

‘Student’ 

1908. The probable error of a mean. Biometrika. 6, pp. 1-25. 

Venn, J. A. 

1866. The Logic of Chance. Cambridge. 



28 . 188 a 


28 

A TEST OF THE SUPPOSED PRECISION OF 
SYSTEMATIC ARRANGEMENTS 


AUTHOR’S NOTE 

At the time when randomisation in experimental design was first ad¬ 
vocated a common objection was that it achieves its object usually 
at the expense of increasing the variability when compared with 
“balanced arrangements.” In this paper an extensive uniformity 
test is used to examine the precision of the different arrangements 
that might have been imposed upon it. Two dangers of regular bal¬ 
anced arrangements that are pointed out are that experimenters us¬ 
ing them are inclined systematically to underestimate their errors, 
in consequence of which Student's and other tests of significance may 
not be even approximately correct, and further that with a system¬ 
atic arrangement the experimenter has an arbitrary choice between 
several widely different estimates. 


Reprinted from Anncds of Vol. VII, Pt. II, pp. 189-193, 1936. 



28.189 


A TEST OF THE SUPPOSED PRECISION OF 
SYSTEMATIC^ ARRANGEMENTS 

By Db S. BARBACKI and R. A. FISHER, Sc.D., b\R.S. 

I. Introduction 

In human as in biological oxperimentathm frequent use is made of the method of pairing, 
often, though erroneously, ascribed to “Student”, who lias, however, expressly disclaimed 
this invention. 

Possibly because of its early introduction, pairing has been so frequently applied without 
the precaution of randomization that, by force of example, it seems often to hav6 been 
thought that this method would provifle a valid estimate of error even when systematic 
pairs were used. Recently, indeed, in (;oimcxion with Bcavaii’s split drill method of 
testing cereal varieties “Student” has claimed explicitly that higher precision is attainable 
with systematic than with randomized arrangements. 

The only method of testing such an assertion is by the direct application of the two 
alternative methods to yields harvested in half drill strips from a trial using only a single 
variety. For, though the precision attainable with the aid of randomization is well known 
from the many trials carried out by workers who have taken this precaution of obtaining 
unequivocally valid estimates of error, the precision of comparisons using a systematic 
arrangement of split drills is not known, since the estimates of eiTor derived from this 
or any other systematic arrangement cannot be relied upon not to be bitiscd in one direction 
or the other. “Student” indeed expresses the opinion that with the splitdrill method the 
error is slightly over-estimated, and this, though contrary to what the following data will 
show, may, in other cases, be true. It is, however, a somewhat back-handed compliment 
to the method, for it implies, as “»Student” does not appear to realize, that, using the 
systematic split-drill method, results which, with randomization, would have been 
recognized as significant will bo passed over as without statistical significance. This has 
been demonstrated with certain systematic stpiare arrangements (O. Tcdin, 1931). 

When this occurs, the effort and outlay expended in carrying through an experimental 
programme may have been frustrated merely through neglecting to take the precautions 
needed to obtain a valid estimation of error. 

II. Experimental data 

Wiebe gives yields in grams of grain for 1500 16 ft. rows of wheat. In order to parallel 
the situation in which the split-drill method is used these rows have been totalled in groups 
of six, omitting one row between each consecutive pair of groups. Each group thus gives 



28.190 

190 PRECISION OF SYSTEMATIC ARRANGEMENTS 

the yields of a half-drill strip, of which there are sixteen running side by side, and twelve 
end to end, as shown in Table I. As in the systematic split-drill method these are lettered 
A B B A, A B B A, across the field. The differences of the yields between the half-drill 
strips, taking A~B in each case, are shown in Table II. 

Table I 



(i) 

(ii) 

(Hi) 

(iv) 

(V) 

(Vi) 

(vii) 

(viii) 

(«) 

(X) 

(xi) 

(xii) 

A 

4410 

4035 

3865 

3640 

3660 

3986 

3490 

3330 

3358 

3712 

3487 

3781 

B 

3950 

3865 

3295 

2960 

2925 

3686 

3400 

3040 

2889 

3196 

3496 

3676 

B 

4185 

4076 

3325 

2860 

2965 

3770 

3240 

2736 

2764 

3460 

3273 

3442 

A 

3786 

3616 

3265 

2816 

2630 

3295 

2875 

2630 

2776 

3040 

2940 

3152 

A 

3870 

3780 

3660 

2980 

2650 

3250 

2925 

2916 

2933 

3277 

3042 

3363 

B 

3910 

3690 

3706 

3050 

2910 

3630 

2985 

3130 

2986 

3040 

2778 

3123 

B 

3890 

3695 

3720 

2990 

2970 

3316 

2910 

2985 

2851 

2636 

2906 

3081 

A 

4190 

3970 

4335 

3350 

3325 

3870 

3120 

3015 

3097 

2909 

2936 

3628 

A 

4170 

4070 

4456 

3610 

3365 

3460 I 

2970 

2856 

2877 

2834 

3020 

3632 

B 

4015 

4480 

4730 

3805 

3375 

3.545 i 

3080 

2810 

2794 

2974 

2770 

3805 

B 

4150 

4765 

5065 

4125 

3660 

3740 

3425 

2690 

2789 

2810 

2895 

3695 

A 

4190 

4740 

6266 

4415 

3676 

3965 

3685 

3030 

2782 

2904 

3080 

2798 

A 

4095 

6075 

6496 

4270 

3760 

4010 

3696 

3255 

2759 

3118 

3287 

3547 

B 

3806 

4360 

4415 

3870 

3586 

3785 

4025 

3300 

3199 

3407 

3473 

3672 

B 

4006 

4225 

3840 

3800 

3780 

3780 

4025 

3710 

3564 

3616 

3539 

3853 


3700 

4325 

3550 

3465 

3640 

3660 

3980 

3705 

3577 

3759 

3568 

3673 


Table II. Differences [A-B) between the yields of pairs of half-drill strips 



(i) 

(ii) 

(iii) 

(iv) 

(V) 

(vi) 

(vii) 

(viii) 

(ix) 

(X) 

(xi) 

(xii) 


460 

170 

670 

680 

726 

300 

90 

290 

469 

617 

- 9 

205 


-400 

-660 

- 70 

- 45 

-336 

-475 

-366 

-105 

11 

-420 

-3.33 

-290 


- 40 

90 

- 46 

- 70 

-260 

-380 

- 60 

-215 

- 63 

237 

264 

240 


300 

276 

615 

360 

355 

655 

210 

30 

246 

274 

30 

647 


156 

-410 

-275 

-195 

- 10 

- 86 

-110 

45 

83 

-140 

260 

-173 


40 

- 16 

200 

290 

125 

226 

260 

340 

- 7 

94 

186 

103 


290 

716 

1080 

400 

176 

225 

-330 

- 46 

-440 

-289 

-186 

- 25 


-306 

100 

-290 

-.346 

-240 

-120 

- 46 

- 6 

13 

143 

19 

-180 

Total 

600 

365 

1785 

1075 

535 

245 

-360 

335 

322 

416 

220 

427 


Since in using randomized half-drill strips it would usually be thought preferable to 
maintain the sandwich arrangement ABBA or B A A B, and to choose between these 
alternatives at random for each sandwich, Table III shows in a similar arrangement the 
differences in yield for the forty-eight sandwiches so obtained. 

Table III. Differences {A—B — B + A) given by the yields of sandwiches 



(i) 

(ii) 

(iii) 

(iv) 

(v) 

(vi) 

(vii) 

(viii) 

(ix) 

(X) 

(xi) 

(xii) 


60 

-390 

500 

635 

390 

-176 

-276 

185 

480 

97 

-342 

- 86 


260 

365 

670 

290 

95 

176 

150 

-186 

193 

611 

294 

787 


195 

-426 

- 76 

96 

116 

140 

160 

385 

76 

- 46 

436 

- 70 


- 16 

816 

790 

65 

- 66 

105 

-376 

- 60 

-427 

-146 

-167 

-206 

Total 

600 

366 

1786 

1076 

536 

245 

-360 

336 

322 

416 

220 

427 







28.191 

S. BARBACKI and R. A. FISHER 191 

The statistical effects of using a systematic or a randomized arrangement are now easily 
compared. The sum of the yields from the A strips is 339,535 g. That from the B strips 
is 333,660, the difference in favour of A being 5875. The mean yield is 336,598, so that the 
actual error, using the systematic arrangement, is 1-745 per cent. 

The sum of the squares of the forty-eight differeruies is 5,538,279, and this will be equal 
to the sampling variance of the difference between the totals, if the sandwiches are 
randomized. In such a case, therefore, the standard error of random sampling is 2353-35 g., 
or 0*699 per cent, of the average yield. The actual error of the systematic arrangement is 
thus nearly 2*5 times as great as the standard error obtainable by a randomized experiment 
of the same scope. 

The analysis of variance of the results of the systematic trial is as follows: 



Degrees of 
freedom 

Sums of 
squares 

Mean 

square 

Varieties 

Estimated error 

1 

47 

719,076 

4,819,203 

719,076 

102,6.36 

Total 

48 

6,538,279 



The standard error of the difference between the total yields, as estimated from the 
experiment, is 2218-50, or 0-659 per cent. It will be noticed that, not only has the systematic 
experiment the higher real error, but that it yields a lower estimate of error than the 
randomized experiment. The test of significance is vitiated for both reasons. Moreover, 
if, as “Student” appears to expect, the real errors of the systematic arrangement had 
boon lower than those of randomized arrangement, it is evident that the estimate of error 
would have been correspondingly raised, so that such experiments would be less sensitive 
than random experiments in detecting doubtfully significant differences. An inaccurate 
estimate of error is a disadvantage in whichever direction it is biased. 


III. Randomized pairs 

It is interesting for comparison to examine the results of randomizing not sandwiches 
of four half-drill strips, but pairs of half-drill strips. This would, of course, be usually 
expected to be a less favourable form of randomization. We have, however, seldom so 
good an opportunity of examining the exact advantage of the sandwich arrangement. 

The sum of the squares of the 96 values in Table I is 9,387,099. The standard error of 
the differences in total yields is, therefore, 3063-84, or 0-912 per cent. The randomized 
pairs of plots have only 77 per cent, of the efficiency of the randomized sandwiches. This 
emphasizes the value of the current opinion. 




28.192 

192 PRECISION OF SYSTEMATIC ARRANGEMENTS 

For the systematic experiment the analysis of variance, using pairs, is as follows: 



Degrees of 

Sums of 

Mean 


freedom 

squares 

square 

Varieties 

1 

359,538 

359,538 

Estimated error 

95 

9,027,561 

95,027 

Total 

96 

9,387,099 



The standard error for the systematic experiment is again under-estimated, being 
3020*35 g., or 0*897 per cent. It should be noticed that when a systematic experiment 
has been carried out, there is no more reason for estimating the error from pairs than 
from sandwiches. Different workers might, with equal justification, arrive at the estimate 
0*897 per cent., or at the estimate 0*659 per cent. Neither has, in fact, any objective 
justification. Numerous alternative estimates can be equally suggested, as, for example, 
by “Student”, who from 96 pairs of half-drill strips proposes to use 94 degrees of freedom 
for error, i.e. those used above less one representing “fertility slope”. The true error, 
however, remains unaltered, in this case at 1*745 per cent. When randomization has been 
practised there is no such ambiguity. When pairs have been randomized they must be 
used in the estimate of error; when sandwiches have been randomized they supply the 
only justified basis. When the whole arrangement is systematic no valid estimate is 
possible, and any estimate airived at is due to the arbitrary choice of the estimator. 


IV. “Student’s” test of significance 

Since the uniformity trial under discussion is more extensive than most practical tests, 
and achieves a higher real precision than is usually attained, it may conveniently be 
subdivided into six minor experiments. To do this, and to minimize the effect of neigh¬ 
bourhood end to end, we may take together the first and seventh series, the second and 


Table TV. Values of **Student's"^ t observed compared with their theoretical distributions 


Values of t 

Sandwiches, n — 1 

Values of i 

Pairs, n=16 

Number 

expected 

Number 

observed 

Number 

expected 

Number 

observed 

-00 to-1*415 

0*6 

_ 

- 00 to -1*341 

0*6 

_ 

-1*415 to -0*896 

0*6 

— 

-1*341 to -0*866 

0*6 

— 

-0*896 to -0*549 

0*6 

— 

-0*866 to -0*536 

0*6 

— 

-0*549 to -0*263 

0*6 

— 

-0*636 to -0*268 

0*6 

— 

-0*263 to 0 

0*6 

— 

-0*268 to 0 

0*6 

— 

0 to +0*263 

0*6 

1 

0 to +0*268 

0*6 

1 

+0*263 to +0*549 

0*6 

— 

+0*268 to +0*536 

0*6 

— 

+0*649 to +0*896 

0*6 

2 

+0*636 to +0*866 

0*6 

3 

+0*896 to +1*416 

0*6 

1 

+0*866 to +1*341 

0*6 

1 

+ 1*415 to + 00 

0*6 

2 

+ 1*341 to + 00 

0*6 

1 

Total 

6 

6 


6 

6 






S. BARBACKI and R. A. FISHER 


28.193 

193 


eighth, etc., so as to make six experiments each with 32 half-drill strips. For each of these 
six experiments “Student’s” t was calculated as a test of significance, and the values 
observed are compared with the theoretical distribution given by “Student”. In both 
trials the six values of t are all positive, whereas in theory they should with equal frequency 
be positive and negative. The positive bias of the values is evidently sufficient to ruin the 
exactitude of the test of significance for which we are indebted to “Student”. 

SrMMARY 

1. This enquiry was carried out to test the truth of the opinion expressed by “ Student ” 
that randomization achieves its object “ usually at the expense of increasing the variability 
when compared with balanced arrangements”, and that one of the means available to 
experimenters of reducing the error is by adopting “a regular balanced arrangement”. 

2. Using an extensive uniformity test it is found that the arrangements randomizing 
either pairs or sandwiches of half-drill strips give smaller errors than the systematic 
arrangement advocated as more precise. 

3. As a consequence experimenters using the systematic arrangement systematically 
underestimate their errors. 

4. The error estimated from a systematic arrangement is ambiguous, and the experi¬ 
menter has an arbitrary choice between several widely different estimates, 

6. Owing to the failure to furnish a valid estimate of error, “Student’s” test of sig¬ 
nificance is not approximately correct for systematic arrangements. 

REFERENCES 

“Studbkt” (1936). “Co-operation in large-scale experiments.” J. roy. statist. Soc. Supplement. 

Tedin, O. (1931). “The influence of systematic plot arrangement upon the estimate of error in field exix^riments.” 

J. agric. Set. 21, 191-208. 

WiEBB, O. A. (1936). “Variation and correlation in grain yield among 1600 wheat nursery idots.” J. agric. 

Res. 60, 331-57. 



29.302a 


29 

PROFESSOR KARL PEARSON AND THE 
METHOD OF MOMENTS 


AUTHOR’S NOTE 

This is the only paper involving personal criticism I have had reason 
to write. The occasion was an attack by Pearson, which on examina¬ 
tion appeared to be flagrantly unfair, on the work of an Indian statis¬ 
tician holding a subordinate position, whom I had reason to know 
to be conscientious and inoffensive in character, in the right on the 
matter under dispute, and in grave danger of injury to his position 
and prospects, through the prestige that his attacker still enjoyed. 
It seemed fitting to make entirely plain the results of a careful ex¬ 
amination of the case. 

Pearson was an old man when it occurred to him to attack Koshal, 
but it would be a mistake to regard either the errors or the venom 
of that attack as a sign of failing powers. In both respects it is very 
much like what he had done repeatedly since the beginning of the 
century. If peevish intolerance of free opinion in others is a sign of 
senility, it is one which he had developed at an early age. Unscrupu¬ 
lous manipulation of factual material is also a striking feature of the 
whole corpus of Pearsonian writings, and in this matter some blame 
does seem to attach to Pearson’s contemporaries for not exposing 
his arrogant pretensions. 


Reprinted from Annala of Engvniea^ Vol. VII, Pt. IV, pp. 303-318, 1937. 



29.303 


PROFESSOR KARL PEARSON AND THE 
METHOD OF MOMENTS 

By R. a. FLSHER, Sc.D., F.R.S. 

I. Apology 

Shortly before his death the late Prof. Karl Pearson wrote a paper on the method of 
moments, which was published in his Journal, Biomeirika, for June 1936. The paper is 
unfortunately marred by great bitterness, and by vehement attacks on an Indian writer, 
R. S. Koshal, whose offence appears to be that of deciding, after trial, that a curve of 
Pearson’s type 1 could be fitted to his data more successfully by the method of maximum 
likelihood than by the method of moments. Pearson’s allusions to Koshal’s work, some of 
which will have to be (pioted, are so unjust as to bo quite inex])licable. His ])aper, however, 
deserves attention as an authentic exposition of the manner in which followers of Prof. 
Pearson should proceed in fitting frequency curves, and as the first attempt to answer the 
doubts and difficulties which have been felt by others with respect to the methods he had 
previously put forward. I hope later to publish Dr Koshal’s further examination of the 
problem, and for the present shall confine attention to Prof. Pearson’s methods. 

Though the occasion of this paper is Pearson’s attack on Koshal, it has been impossible 
to treat the matter in due perspective without a general criticism of methods originating with 
Pearson, which have been widely disseminated. The intrinsic worth of those methods has 
long ajipeared to me to have been gravely exaggerated. Pearson of)ens his jmper with the 
italicized query “ Wasting your lime fitting curves by moments, eh? ” thus expressing in his 
own words and style the scepticism with which he felt his procedures were being regarded 
by others. The question ho raised seems to me not at all premature, but rather overdue. 
During his last years Pearson’s intimates have earnestly represented that irritation and 
controversy might be dangerous to his health. In (jonsequence, the discussion of many points 
has been suspended. With Pearson’s death my obligation to examine frankly the status of the 
Pearsonian methods reasserts itself, and in the last sectiqn I have attemf)ted a brief review 
of the situation as it now stands. 

II. Pearson’s ATTEMi^rKD solution hy moments 
On p. 43 Pearson gives his solution as follows: 

“equation to curve in Professor Fisher’s form: 

y = yj,(a;-0-6789,0842)0-4«7,26i5( 15.9508,4325 -x)*oi26,3698^ 

where logy^ = 3*629,3626’’, 

and on p. 44 he gives in a table a comparison between the observed frequencies and the 
expectations supposedly derived from this curve. These are shown in Table 1. 



29.304 

304 


METHOD OF MOMENTS 


The figures are as Pearson gives them, save that I have inserted the totals of the fre- 
(]uencies, corrected the sign of the deviation in two casest and in the two final columns given 
the individual values for the last five classes. Pearson gives the total deviation —O-SS, and 
the contribution to x* 0-0373, as though x* were to be used for testing significance. As a 
matter of fact, the rough fitting originally given by Koshal and Turner had settled the 
question of significance by showing that the series of frequencies observed were quite com¬ 
patible with the view that the breaking strength for this particular sample of fibres (though 
apparently not for others) followed a curve of Pearson’s type I. Throughout Pearson’s paper 
X^ is used, on the contrary, as a basis for claiming that the method of moments, as used by 

Table I 


Breaking 

strength 

ing. 

Expected 

frequency 

m 

Observed 

frequency 

a 

Deviation 

a — m 

(a-m)* 

m 

0-45 

32-87 

38 

+ 6-13 

0-8006 

1-46 

109-17 

166 

-4-17 

0-1026 

2-45 

186-72 

188 

+ 1-28 

0-(K)85 

3-46 

167-20 

159 

-8-20 

0-4022 

4-45 

137-04 

137 

-0-04 

0-0000 

6-45 

106-48 

114 

+ 8-62 

0-6882 

0-46 

76-72 

81 

+ 4-28 

0-2385 

7-46 

62-60 

48 

-4-60 

0-4023 

8-45 

33-82 

29 

-4-82 

0-6869 

9-46 

20-07 

19 

-1-07 

0-0670 

10-46 

10-76 

15 

+ 4-24 

1-6708 

11-46 

6-02 

6 

+0-98 

0-1913 

12-46 

1-91 


-1-91 

1-9100 

13-46 

0-62 

*1 

+0-48 

0-4431 

14-46 

0-08 


-0-08 

0-0800 

16-45 

0-00 




Total 

999-98 

1000 

+ 0-02 

7-6820 


Pearson, gives a better fit than do other methods, and for this purpose it is clearly illogical 
to ignore the observed distribution of the seven values exceeding 10-96. 

A comparison of the algebraic form of the curve, with the expectations ascribed to it, shows 
at once that they are widely discrepant. On examination, the discrepancy is found to have 
more than one cause. The simplest of these is that the two termini of the curve given in the 
equation are both too large by 0-06 and should be taken to be 
0-6289,0842 and 16-9008,4326. 

Correcting this slip, it is found that Pearson’s expectations agree sufficiently well with the 
curve to show that this is the curve he has used, but not sufficiently well to make the com¬ 
parison Pearson had in view. The frequencies derived from Pearson’s formula are given below 
(Table II) along with those which he gives. 

It may be seen that, apart from a casual error in the eighth panel, the errors of Pearson’s 
quadrature are concentrated at the head of the table. They are sufficiently large to vitiate the 




R. A. FISHER 


29.305 

306 


comparison with the observed frequencies, and to diminish the value of more than a 
unit. From a curious section in Pearson’s paper it appears that these errors were deliberate. 
Speaking of Koshal’s method of quadrature, he says (p. 42): 

“By direct independent quadrature of the areas on the first and second subranges (and 
even on the third subrange) we assured ourselves that this process is inadequate. However, 
as it has been used... we will adopt the same process, so that any controversy arising from 
that paper may not be increased by differences as to the suitable methods of quadrature.” 

Tliis is a very strange reason to give for presenting incorrect values, and it becomes even 
less plausible when we note that Pearson does not compare his series of exjjected frequencies 


Table II. Nurnericul errors in Pearson's theoretical frequencies 


Breaking 
strength 
in g. 

Expectations 
given by 
l^earson 

Errors of 

RecaUiulated 

expectations 

(m) 

Observed 

frequencies 

(«) 

Deviations 

Pearson’s 

expectations 

(a-m) 

(o-m)* 

m 

0-45 

1- 45 

2- 45 

3- 45 

4- 45 

5- 45 

6- 45 

7- 45 

8- 45 

9- 45 

10- 45 

11- 45 

12- 45 

13- 45 

14- 45 

15- 45 

32-87 

169-17 

186-72 

167-20 

137-04 

105-48 

76-72 

52-60 

33 82 
20-07 
10-76 

5-02 

1-91 

0-52 

0-08 

0-00 

+ 2-28 
-2-04 
-0-16 
-0-03 
- 0-01 

1 -0-04 

30-5932 

171-2071 

186-8828 

167-2268 

137-0.522 

105-4831 

76-7232 

.52-6443 

33-8230 

20-0724 

10-7604 

.5-0217 

1-9114 

0-5217 

0-07.52 

0tH)22 

38 

165 

188 

1.59 

137 

114 

81 

48 

29 

19 

15 

6 

1 

+ 7-4068 
-6-2071 
+ 1-1172 
-8-2268 
-0-0522 
+ 8-5169 
+ 4-2768 
-4-6443 
-4-8230 
-1-0724 
+ 4-2396 
+ 0-9783 
-1-9114 
+ 0-4783 
-0-0752 
-0-0022 

1-7932 

0-2250 

0-0067 

0-4047 

0-0000 

0-6877 

0-2377 

0-4097 

0-6877 

0-0573 

1-6704 

0-1906 

1-9114 

0-4385 

0-0752 

0-0022 

Total 

999-98 

0-00 

1000-(K)07* 

KKK) 

-0-0007 

8-7980 


* The excess in the total expectation is due to the use of Pearson’s value for log y,, which is wrong in the 
seventh place. 


with any that are given by Koshal, and which might be expected, for the reason he gives, to 
require a similar correction. He does not, in fact, show that Koshal’s expectations actually 
are in error in the way that he susjMJcts; but, apparently knowing what he is doing, gives 
false frequencies which make his series of expectations appear to agree more nearly with the 
facts than they would if he gave the correct values. 

The second distribution, with which Pearson compares his own, consists of a series of 
expectations concocted by himself, but which he ascribed to the method of maximal likeli¬ 
hood. This ascription he clearly knew to be untrue, for he asserted that his fit not only gave 
a lower x*, but also a higher likelihood. The claim that his error has been introduced for the 
sake of comparability must, however, be taken to mean that he has introduced errors, of 
similar magnitude and sign, into both series of expectations, 'Now the deviations in these 






29.306 

306 METHOD OF MOMENTS 

two series have the same sign, except in the first two panels, where the error, for which 
Pearson introduces the excuse of comparability, is greatest. For these two panels they are of 
opposite sign. Thus, an error of 2*28 in Pearson’s first expectation diminishes the apparent 
deviation from +7-41 to 4-6-13. But a similar error would increase the deviation ascribed 
to the method of maximal likelihood from — 1*71 to - 3-99. In the same way, in the second 
panel, his error -2*04 diminishes the apparent deviation of the “Method of Moments” 
from -6*21 to —4-17. But it would increase the deviation ascribed to the method of maximal 
likelihood from 11-90 to 13‘94. 

Whereas, therefore, the error of quadrature which Pearson has introduced diminishes the 
value of ^ (as he reckons it), ascribed to the method of moments, from 6-2137 to 6-1151, the 
same error has been used to increase the ascribed to the method of maximal likelihood by 
about the same amount. The difference he shows in favour of his method is only 1-8416. One 
is led to suspect that, without the introduction of this error of computation, even the com¬ 
parison which he had chosen to make, artificial as in other ways it is, would have disappointed 
his expectation of being able to claim a smaller value of x^. 

The plea of comparability is, therefore, only an excuse for falsifying the comparison he 
wished to present. It is imiK)rtant that such tricks should be exposed, since their aim is not 
to advance knowledge, but to hinder its advance. 

III. Failure to equalize the moments 

The calculation of comparatively accurate values for the expectations in Pearson’s 
solution enables us to examine a further point. Pearson seemed to have no doubt that his 
solution might stand as a model of the manner in which the method of moments should be 
applied. “This ”, he says (p. 43) of the solution we are examining, “is the best result that the 
method of moments can give.” It must, however, bo questioned whether the method of 
moments has given the result at all. In Pearson’s procedure the moments of the grouped 
data are adjusted in acwjrdance with certain formulae, called “corrections for abruptness”, 
published in 1918 by Pairman and Pearson. The confidence, not to say arrogance, with 
which Pearson regarded those adjustments is shown by his criticism of Koshal and Turner, 
who, like most workers in this subject, had omitted to use them. The reproof might have 
come from an angry pedagogue ! 

“This is no excuse* in 1930 for not correcting the moments for abruptness I ” 

Yet, as will be seen, the adjustments supplied by Pearson are found, on examination, to 
be very inaccurate, and to have led him wide of the mark. Let us see how the moments of 
Pearson’s own curve have, in fact, been affected by grouping. This can be done directly, once 
the expectations are accurately calculated, by comparing the moments of the expectations 
with those of the ungrouped curve from which they were derived. Taking moments about 
the mid-point of the fifth class, i.e. 4-46, the comparison is as follows: 

* Koshal and Turner had made no excuse, and had not indeed even discussed using these “corrections for 
abruptness.” 



29.307 

R. A. FISHER 307 

Table III. Momenta, abovi 4*45, of Pearson's solution, ungrouped and grouped 



Ist 

2nd 

3nl 

4th 

Pearson’s solution ungrouped 

-0-4043 


4-1855 

86-6341 

Pearson’s solution grouped 

-0-4039 

S-OSOl 

4-0553 

89-696^ 

Difference 

-0-0004 

-0-0849 

+ 0-1302 

-3-0628 


We may first compare these differences with Sheppard’s adjustments; since the means 
of the distributions are unequal, the momenta are compared about a convenient fixed point 
not equal to the mean. In consequence the third moment will also be affected by Sheppard’s 
adjustments. 


Table IV. Effects of grouping coynjKired with Sheppard's adjustments 



lat 

2nd 

3ni 

4th 

Ungrouped — grouped, actual 

-0-0004 

-0-0849 

+ 0-1302 

-3-0628 

Sheppard’s adjustment 

0 

-0-0833 

+ 0-1025 

-3-2181 

Difference 

-0-0004 

-0-(K)16 

+ 0-0277 

+ 0-1553 


The last line shows the further adjustments needed, beyonrl those given by Sheppard’s 
formulae, if the true curve were really of the form found by Pearson. The claim that the 
so-called “corrections for abruptness’’ are an improvement on Shej)]>ard’s adjiLstments may 
be tested by comparing the further adjustments needed with those imposed by the method 
of Pairman and Pearson. 


Table V. Comparison of adjustment needed for Pearson 's curve, with that supplied 
by the '^abruptness corrections" after using Sheppard's adjustments 



1st 

2nd 

3rd 

4th 

Adjustment needed 
“Abruptness correction” less 
Shepiuinl’s adjustment 

-0-0004 

+0-0057 

1 

-0-t)0|6 

-0-0495 

1 _! 

+ (i-0277 

+ 0-3170 

+ 0-1553 

-1-4118 


The adjustments provided by the abruptness coefficient seem quite unrelated to those 
needed for the form of curve actually found by Pearson. Two are of the same sign, and two 
of opposite sign. In every case Pairman and Pearson’s values arc much too large. Of the 
cases in which the “ correction ’’ is in the right direction, it is 30 times too great for the second 
moment, and nearly 12 times too great for the third. For the purfiose of allowing for tlic 
effects of grouping, it would have been considerably more accurate to ignore the “corrections 
for abruptness ” altogether. The errors after applying them arc respectively 15, 30, 10 and 10 
times as great as they were before ! 







29.308 

308 


METHOD OF MOMENTS 


It is, I suppose, because these corrections have been so seldom used that their fallacious 
character has not been previously emphasized. Statisticians who doubted their utility have 
merely avoided their use, having, in general, little interest in Pearsonian curves as theoretical 
forms, and, in particular, knowing that the method of moments would not be an efficient 
way of fitting these curves, if they were wanted. Even so, the fact that errors so gross should 
have retained their author’s confidence for 18 years must imply a strange lack of the power 
of self-criticism. Have they never before been put to an objective test? Or, have the results 
of such tests been suppressed or disregarded ? 


IV. A MORE ACCURATE FIT BY MOMENTS 

The equations of estimation by the method of moments are 
8{m^xj.') = S{afXj^), p = 0,1,2, 3,4, 

where m,. is the expected and the observed frequency in the rth panel, of which the mid¬ 
abscissa is x^. This is known not to be an efficient method of estimation for Pearsonian curves 
(Fisherr 1921). In large samples the sampling errors of the estimates will be larger than those 
derived by the method of maximal liWelihood, which has been f)roved to bo efficient in general. 
It is, however, a consistent method, and for coarsely grouped data, like those under discus¬ 
sion, it may not be very inefficient. 

It has been shown that Pearson, through reliance on the “Pairman and Pearson correc¬ 
tions”, failed to satisfy the equations of moments. It is noteworthy that, though an ardent 
advocate of the method of moments, he never developed a method of satisfying its funda¬ 
mental equations. Knowing, however, the actual effects of grouping on any approximate 
solution, such as that provided by Koshal and Turner in 1930, or that of Pearson discussed 
above, an improved solution may be obtained by adjusting the ungroujjed moments by 
amounts equal and opposite to the observable errors in the grouped moments. The iteration 
of this process, though tedious, may be expected to converge upon the moment solution. The 
problem of making a more accurate fit by moments will be discussed in Section VI. Starting 
with Pearson’s curve, for which the moments have been worked out above, Table VI shows 
the approximate solution of the moment equations obtained at the second iteration. 

It is seen (Table VII) that the value of x® is reduced by 2-0628 if all occupied panels are 
used, and by 2-0216 if the frequencies above 10-96 are grouped. To perceive that the fit has 
been greatly improved there is no need to use x®, or any other conventional test. The total 
of the absolute deviations, for example, shows the difference equally plainly. It is obvious 
that the frequencies in the first two panels agree much better than in Pearson’s fit, while an 
examination of the rest of the table shows the two curves as equally successful. Had Pearson 
fitted the curve properly by moments, he could have claimed a considerably smaller value 
of x®, without falsifying his frequencies. 



R. A. FISHER 


29.309 

309 


Table VI. A curve approximately satisfying the moment equations 


Breaking 
strength 
in g. 


0-46 

1- 45 

2- 46 

3- 45 

4- 46 
6-46 

6- 45 

7- 46 

8- 45 

9- 46 

10- 46 

11- 46 

12- 46 

13- 46 

14- 46] 
16-46 J 
16-46J 


Total 


Frequency 

Frequency 

obwMrved 

(«) 

Deviation 

Reduction compared 
with Pearson’s solution 

expected 

(m) 

1 

(a—m) 

(““"•I* 

m 

+ 

- 

37-6310 

38 

-J-0-3690 

0-0035 

1-7897 

.-- \ 

165-4276 

166 

-0-4276 

0-0012 

0-2238 

1 

184-2287 

188 

+ 3-7713 

0-0772 


0-0706 r 

167-1309 

169 

-8-1309 

0-3956 

0-0091 

j 

137-8327 

137 

-0-8327 

0-0060 


0-(X)50i 

106-2212 

114 

+ 7-7788 

0-6697 

0-1180 

- 1 

77-1006 

81 

+ 3-8995 

0-1968 

0-0409 

... 1 

62-6778 

48 

-4-6778 

0-4164 


0-(K)67; 

33-6630 

29 

-4-6630 

0-64.59 

0-0418 

... >> 

19-8760 

19 

-0-8760 

0-0386 

0-0187 

... 1 

10-6280 

15 

+ 4-3720 

1-7986 


0-1281 r 

4-9815 

6 

+ 1-0185 

0-2082 


0-0176) 

1-9360 


-1-9360 

1-9360 


0-0246 V 

0-5623 

*1 

+ 0-4377 

0-3408 

0-()977 

... 1 

0-1028 


-0-1028 

0-1028 


0-02641 

1000-0000 

KMK) 

0-0000 

6-7352 

2-3397 

1 0-2769 


Grouped 

n»duction 


l-9.')22 


0-1482 


-0-0862 


0-0477 


The algebraic form is 

2/ = 3/o(^ - 0-r)727,7559)O M’»-«»»9-» (16-2979,2562 -rc)« 3*8M835^ 2/o = *422,9195,2. 


Table VII. Values of ^ for three series of expectatiovs 



All 

classes 

Difference 

Final classes 
groujjed 

Difference 

Pearson’s frequencies 

7-6922 

1-10.58 

5-1161 

1-0986 

Pearson's solution recalculated 

8-7980 

2-0628 

6-2137 

2-0216 

Approximate moment solution 

6-7362 


4-1922 



V. The error in Koshal and Turner’s paper 

The badness of Pearson’s fit, in an avowedly methodological paper, is due to his reliance 
on the abruptness coeflicients. Koshal and Turner, following the advice frequently, though 
uncritically, given for curves having an abrupt terminus, omitted to use Sheppard’s ad¬ 
justments. Had they done so, their fit would have been closer to the moment solution than 
that given by Pearson, six years-later, and after studying their i)ay)er. It is true that their 
paper contains at one point numerical errors to which Peanson repeatedly calls attention; 

P. 37. “The last two values are in error. On the basis of these erroneous values they 
obtain....” 

P. 38. “Two of which be it remembered are in error.” 

P. 38. “The moments are not corrected and there are blunders in the arithmetic.” 






29.310 

310 METHOD OF MOMENTS 

P. 41. “These crude moments are not even correct owing to faulty arithmetic.” 

P. 60. ‘‘ Elderton and Hansemann, not having apparently seen the Koshal-Turner paper, 
calculated the crude or unadjusted moments from” Koshal’s table for testing significance 
“ treating the 7 ‘ above 10*96 ’ as grouped between 10*96—11*96; whereas 6 are actually in that 
group and 1 in the group 12*96—13*95. Hence the Elderton-Hansemann raw moments agree 
neither with the true raw moments, nor with those provided by Koshal. They suppose 
Koshal may have adjusted in some way his moments. Koshal in (c) suggests that the source 
of the difference is the treatment of the ‘above 10*96’ group. That difference does in part 
arise from this fact, but it also springs from the occurrence of blunders in Koshal’s arith¬ 
metic! ” 

Pearson gives his readers no opportunity of knowing the magnitude of these so-called 
blunders of Koshal and Turner. He evidently, and rej)eatedly, implies that they discredit 
their work. The errors affect the third and fourth moments; when these are measured about 
the mean, we have the following values: 


Table VIII 



3rd moment 

4th moment 

True moment 

1()*6736 

103* 13273 

As given by Koshal and Turner 

lOOOlU 

103112:17 

Error 

-oon? 

-0 02036 

Error, using 4*40 as origin 

-00117 

-0()0M7 


No quantity perhaps should be thought of as important or unimportant in itself, (’ertainly 
numerical en’ors are by all means to be avoided. Most statisticians experienced in numerical 
work are sufficiently conscious of their own liability to error, not to wish to mount the high 
horse where the slips of others are concerned. For the immediate purpose of Koshal and 
Turner, who were not writing a methodological pa[>er, but were interested in the distribution 
of the breaking strengths, these errors are entirely unimportant. Since many may have been 
led by Pearson’s attack to think them serious, it is but fair to compare them with the errors 
introduced by Pearson. We may distinguish two sources of these: (i) in the computation 
of the exj)ected frequencies, an<l (ii) due to the adoption of the Pairman and Pearson 
“corrections” for abruptness. 

(i) Errors in the moments due to Pearson’s faulty computation. Moments of the fre¬ 
quencies given by Pearson, and of those found by recalculation: 


Table IX. Moments about the tvorking mean 4*46 



Ist 

2nd 

3id 

4th 

Frequencies given by Pearson 
Recalculated frequencies 

II 

6 6 

1 1 

6-6968 

6-6801 

3- 9636 

4- 0663 

90-1002 

89-6969 

Error 

-0 0028 

-1-0-0167 

-0-0918 

+ 0*4033 












R. A. FISHER 


29.311 

311 


Pearson’s errors affect all four moments; in the third moment, alone, however, his error is 
nearly 8 times that of Koshal and Turner. In the fourth moment it is over 300 times as great, 
(ii) Errors in the moments due to Pearson’s use of abruptness coefficients: 


Table X 



1st 

2nd 

3ni 

4th 

Pearaon’B solution 

-0*4039 

6*6801 

4*0553 

89*6969 

True grouped moments 

-0*41 

6*7280 

3*7760 

91*2640 

Error 

+00061 

-0*0379 

+ 0*2793 

-1*6671 


Quite apart from the errors introduced into the first and second moments, Pearson’s 
errors due to this cause are, resj)ectively, over 20 and over 1000 times as great as those which 
in the work of Koshal and Turner are stigmatized as blunders. The English language does not 
seem to possess a word 20 times as forcible as “blunder”. May we hope that some Eastern 
tongue known to Koshal is more amply provided ! 

The statement that the discref)an<!y between Elderton and Hansmann’s moments and those 
given by Koshal and Turner “also springs from blunders in Koshal’s arithmetic” is without 
foundation. Elderton and Hansmann only used two moments in their process of fitting, and, 
had they been concerned with the others, it should be apparent to anyone wishing to refer 
to the point that the arithmetical error discussed above is not only unimportant in magnitude, 
but, for what it is worth, it partially annuls, instead of contributing to, the discrepancy ! 

It is, therefore, apparent that the simple procedure of Koshal and Turner, amended merely 
by the inclusion of Shcpi)ard’s adjustments, gives a solution nearer to the true moment 
solution than does Pearson’s process; and would do so whether or not the small arithmetical 
error, so much enlarged uf)ori by Pearson, were corrected. 

It is pathetic to find Pearson defending his curve-fitting procedure on the ground that it 
“trains one in accurate numerical work”. Pearson’s love of accurate numerical work was, 
I think, genuine; and it was his great merit as a statistician that he often tried to put his 
theoretical writings to a numerical test. His example on this point is valuable; whereas he 
was a clumsy mathematician. Had it not been for his arrogant temper, his taste for numerical 
exemplification might well have saved him from serious theoretical mistakes. 

When we are asked: 

“Lastly, how can it be waste of time fitting curves by moments, if such a process enables 
one to see the dangers in classifying material, the need for properly computing moments, 
and trains one in accurate numerical work? ” 

the answer must be that it has no special qualification on the first point, and as regards the 
second and third conditions, Pearson has himself proved, by his own example, that it does 
not in fact fulfil them. He overlooked “the need for properly computing moments” 




29.312 

312 METHOD OF MOMENTS 

through reliance on his abruptness corrections; and he abandoned accurate numerical work 
on finding that inaccurate methods gave him an excuse for displaying his solution with a 
false advantage. 

If Pearson’s object were to find the curve which satisfied the four moment equations, his 
procedure is (jertainly waste of time. If his object were to obtain efficient estimates of the 
four unknown parameters, then to satisfy the moment equations would be waste of time. In 
fact his object must have been to prove by this example that “the method of moments”, 
by which term he refers to his own procedure, is superior to the method of maximal 
likelihood. 

Pearson evidently misinterpreted the aim of Koshal’s paper (1933), of which Koshal was 
the sole author, though Pearson endeavours to give a false impression by referring to it 
constantly as “the Koshal-Fisher paper”. Of this paper Pearson says (p. 36) that it “was 
clearly planned to show how the ‘Method of Moments’ is much inferior to the ‘Method of 
Maximum Likelihood’”. This misapprehension of Pearson’s is evidently the cause of his 
whole attack. 

At the time he wrote his 1933 paper, Koshal knew, as his introductory remarks show, that 
the method of moments gave estimates of the unknown parameters with standard errors 
larger than those of estimates obtained by the method of maximal likelihood; that the 
estimates obtained by the method of moments, even when properly carried out, wore 
affected by errors of estimation of the, same order of magnitude, in large samples, as the 
errors of random sampling; that for finely groui)ed material the efficiency of the method of 
moments fell to very low values, for curves so far from the normal as that for his cotton fibres. 
If he had wished to test any of these statements, Koshal would, of course, not have relied on 
a single example, which would tell him nothing decisive, but on a reinvestigation of the whole 
population of sam])les drawn from a population of Bjiecifiod form. It is to such a population 
of 8amj)le8 that the terms “error of random sampling”, etc. refer. 

Koshal knew that the theoretical objection to the method of moments had never been 
answered: failing theoretical justification it had been put about that it was a “practical” 
method, while the equations of maximal likelihood were impossible of solution. Koshal 
determined to take the case at its most difficult, i.e. a heavily grouped Pearsonian curve, and 
to show that the direct numerical approach to the solution of maximal likelihood was not 
impracticable. Using some very ingenious simplifications, he ran the work out in a short 
time—too hurriedly, it would scorn, for accuracy, for his shot at the solution of maximal 
likelihood, though closer than the previous rough fit by moments, has evidently not hit the 
mark. The earlier fit was used as a starting-point merely because it was already available; 
any rough approximation, such as that used by Elderton and Hansmann, would have 
served equally well. 

Koshal’s paper of 1933 was, in fact, not concerned with the general and theoretical 
advantage of satisfying the equations of maximal likelihood, rather than the equations of 
moments, but with overcoming the practical difficulties in the former course. This he did by 



R. A. FISHER 


29.313 

313 

systematic trial in the neighbourhood of the solution. It has been seen that a similar approach 
is needed if the equations of moments are used, though advocates of the method of moments 
seem to have ignored this fact. 

Misunderstanding Koshal’s purpose as he did, it was natural that Pearson’s factious mind 
should plan to do just the opposite, and ])rove that the method of maximal likelihood was 
much inferior to the method of moments. We have .seen the means he chose to use for this 
purpose. We may marvel at a senior scientist whom, for his years and past work, many in 
England and America would be glad to honour, rushing in to f)rove by a single example 
what no single example could possibly prove. 


VT. Satisfying the moment equations 

Table XI exhibits the calculation of the moments of the expectations from the approximate 
solution of the method of moments given in Table VI. In order to secure more accurate 
moments a few additional figures have been retained. It will be seen that the fit is by no 
means exact. The remaining errors in the four moments are shown below: 


Table XI. Calculation of grouped moments of approximate solution 


37«3(H),7 

150-6239 

602-0956 

-2408-3821 

9633-5283 

165 4276,« 

-496-2827 

1488-8480 

-4466-5441 

13399-6324 

184-22K6,6 

-368-4573 

736-9146 

-1473-8293 

2947-6586 

167-1308,8 

-167-1.309 

167-1309 

- 167-1309 

167-1309 

137 8327,4 





106-2211,8 

106-2212 

106-kl2 

106-2212 

106-2212 

77-1005,1 

1.54-2010 

308-4020 

616-8041 

1233-6082 

52-6777,67 

1.58 03.33 

474-0999 

1422-2997 

4266-8991 

33-6629,95 

1.34-6620 

.538-6079 

21.54-4317 

8617-7267 

19-8760,31 

99-3802 

496-9008 

2484.50.39 

12422-5194 

10-6280,28 

63-7682 

,382-6090 

2295-6.540 

13773-9243 

4-9816,29 

34-8707 

244-0949 

1708-6644 

11960-6511 

1-9360,12 

15-4881 

I23-9(H8 

991-2381 

7929-9052 

0-6622,80 

5-0605 

4.5,5447 

409-9021 

3689-1191 

0-0974,09 

0-9741 

9-7409 

97-4090 

974-0900 

0-0054,28 

0-0.597 

0-6.588 

7-2247 

79-4713 

0-0(K)0,04 


0-0006 

0-0069 

0-0829 

999-9999,83 

-409-6868 
_ 1 

6725-7745 

377847.34 

91202-1687 


Table XTI. Errors of approximate solution 



Ist 

2nd 

3rd 

4th 

Moments of data 

-0-41000 

5-7280 

3-7660 

91-264 

Approximate solution 

-0-40969 

5-7258 

3-7785 

91-202 

Error 

+0-00031 

-0-0022 

+0-0125 

-0-062 




29.314 

3U 


METHOD OF MOMENTS 


For the purpose for which the curve has been used, these discrepancies are not serious. 
Its aim was to determine in what way Pearson’s solution differed from one that would satisfy 
the equations of moments. For this we only require that the remaining errors shall be small 
compared with those of Pearson. The four ratios which Pearson’s bear to those above are 
approximately 20, 22, 23, 25. These ratios are high enough to show that, for the purpose of 
comparison with Pearson’s curve, the approximation has been carried far enough. 

On the other hand, the residual discrepancies are still larger than the arithmetical errors 
of Koshal and Turner, to which so much of Pearson’s attention is devoted. 

If it were desired to compare the solution by moments with, for example, curves fitted by 
any efficient method, a considerably higher approximation might be needed. We have only 
carried the process of approximation two steps from Pearson’s solution. A solution based on 
Sheppard’s correction would have given a better start, and saved fully one step. The last 
operation used has diminished the previous discrepancies to less than one-fifth. Two more 
operations of the same kind would therefore suffice to give practically four-figure accuracy 
in all four moments. 

Since for sufficiently small steps the changes in the four ungrouped moments are doubtless 
linear functions of the changes in the four grouped moments, a more methodical procedure, 
which for higher accuracy would also be perhaps quicker, would be to make four trial solu¬ 
tions in the neighbourhood of the starting-point, after the method adopted by Koshal. 

Thus, if our initial di8cre[)ancies in the four moments are A, B, C and D, a small (unit) 
change in the first ungrou()ed moment, leaving the three others unchanged, will cause changes 
in the required direction abed’ 

similarly small unit changes in the second, third and fourth ungrouped moments will cause 
changes in the groujied moments abed 

^ 3 , 

® 4 » ^4» C4, d^. 

Then the required changes in the ungrou|)ed moments a, jS, y, 8 will be the solutions of the 
four equations o,« + o,)S + o,y+a,8 = ^, 

b^a -f 63)3 -|- 63)/ -f* 648 = By 
Cja -1- C2/3 + c^y + C48 = Cy 
d^cL 4* ^2^3 -f- d^ -f- £^48 = B. 

The errors of the solution will arise only from non-linearity of the relations between the 
groui)ed and the ungroui)ed moments, as in Newton’s method of approximation. A bad start 
may, in such a case, be very troublesome. It is doubtful if Pearson’s attempt at a moment 
solution would have been a sufficiently good start, for the effect of grouping on his curve is 
to increase the mean by about 0-0004, whereas the effect on the true solution seems to1)e 



R. A. FISHER 


29.315 

315 


nearly - 0-0013, i.e. in the opposite direction. Four trial curves in the neighbourhood of the 
solution found using Sheppard’s correction, would, I think, lead to an excellent approxi¬ 
mation. 

The nature of the equations obtained in this way may be judged from values, which I have 
not checked, obtained from the work at this stage, namely: 

M28847a-I- 0* 135611 -I-0-004214y-0-0045268= -00003,142, 

0-518305a-l- 1-291830j3-0-017936y-0-0067488= +0-0022,255, 

2-693647a+ 2-833194)3+l-130356y-0-1002228= -0-0124,734, 

30-936019a + 22-280860)3 - 0-476005y + 0-3913478 = +0-0618,813. 

The ungrouped moments (about 4-46 as origin) of the solution which these equations indicate 
are as follows; (,) ( 2 ) ( 3 ) ( 4 ) 

-0-4086,6089 5-6315,600 3-9607,00 87-8789,67 

which, apart from errors of calculation, ought to give a tolerably close fit to the observed 
moments. 


VII. The place of Pearsonian methods in statistical teaching 

The example chosen by Pearson has served in the previous sections to emphasize the 
following points: 

(i) Pearson never supplied a general method adequate for fitting grouped data by 
moments. 

(ii) The method he advocated fits badly in the examjile of his choice. 

(iii) The solution of the equations of moments gives a much closer fit; and can be found, 
though by a tedious series of approximations. 

(iv) It has long been known, and undisputed, though ignored, that the solution of the 
equations of moments is itself inefficient, i.e. it introduces errors of estimation comparable 
with the unavoidable errors of random sampling. Other methods will, therefore, supply 
better estimates. 

If we add to these the circumstance that the biological worker seldom if ever has reason to 
believe that the population he is sampling really follows a Pearsonian curve, it is apparent 
that the practical utility of Pearson’s methods for biological research has been greatly 
overrated. 

From the point of view of the teaching of statistics in mathematical departments, the 
position is more serious, for though few workers in biology are ever tempted to occupy their 
time “fitting curves by moments”, a number of statistical departments have given the 
subject a considerable place on their schedules. It should be remembered that this place 
must be allotted to it at the exf)ense of attention to other subjects, which also need time if 
skill is to be gained. 



29.316 

316 


METHOD OF MOMENTS 


I should like to call attention to the following subjects, which are at present often in¬ 
adequately taught, or entirely neglected: 

(а) Finite differences; the theory of comprehensive computational processes. This is, 
itself, a large field; the notations and formulae used require much practice for proficiency. 
Apart from numerical interpolation, differentiation, and integration, the use of differences 
for checking or for “smoothing” data can only be mastered by means of rather lengthy 
examples. 

(б) The exact treatment of small samples; tests of significance. Fortunately this requires 
no elaborate theory. All the relevant ideas are simple and direct. They ought to be mastered 
at an early age, but, until they find their place in school mathematics, time must be found 
for them at universities. Without them the mathematical student has no chance of under¬ 
standing the structure or treatment of experimental data. 

(c) Analysis of variance and covariance. Familiarity with the immense variety of problems 
reducible to these standard forms can only be gained by wide experience and practice. At 
present it is also necessary to know many of the diverse forms (multiple correlations, correla¬ 
tion ratios, etc.) in which the same problems were formerly expressed, so as to translate them 
readily into familiar terms. 

(d) The theory of estimation. Under this item a brief account of Pearson’s procedure 
might, for historical reasons, be included. No student could, however, make any progress in 
the theory without seeing in what ways Pearson’s methods were defective. In the present 
case, for example, he would see that the method of moments, properly carried out, was 
Consistent; from an infinitely large sample it would lead to the correct result; but that 
Pearson’s procedure was not even (Consistent. He would be able to prove that the method of 
moments, even when accurately a[)j)lied, is not an Efficient method; and he would know how 
to propose a number of ways of finding an Efficient solution. In particular, he would know 
that the equations for minimizing 


[a* 

I 


= 0 , 


for variation of each parameter, 0, tend in large samples to equivalence with the equations 
of maximal likelihood, ^ 


and, consequently, that, when all classes are well occupied, the method of minimal x® gives 
an efficient fit. In fitting Pearsonian curves, however, ill-occupied classes usually occur at 
the tails, as in the present example. With curves of unlimited range this occurs however large 
the sample is made, so that the theoretical efficiency is never reached. To minimize x® for 
the data as grouped for testing significance is, of course, to fit a curve to data different from 
those observed. Consequently, for refined purposes, x* ceases to be an adequate criterion of 
the success with which a curve is fitted. An adequate background in practical approximative 
processes will, in any case, enable the student to explore in what ways best to attain a 
solution which would be theoretically defensible. 



R. A. FISHER 


29.317 

317 


(e) Practical computation on topics of interest in contemporary research. This should be 
the final object to which university teaching in statistics is directed. Many biological and 
sociological workers are faced with mathematical difficulties which, for various reasons, 
they cannot overcome. Most, naturally, prefer to tackle their own problems, but, even when 
they are sufficiently successful in this, a more ex|>ert examinatioi\ of the problem is often 
very fruitful. Actually, whole bodies of data, or ])otential data, lie fallow for lack of the 
means of interpreting them. A live centre of statistical teaclnng will seek out such cases, both 
among their colleagues, and in the literature. 

So long as “fitting curves by moments” stands in the way of students' obtaining proi)er 
experience of these other activities, all of which require time and practice, so long will it be 
judged with increasing confidence to bo waste of time. 

VIII. Addendum 

Friends who have read Pearson’s attack on Koshal have insisted that a number of 
miscellaneous statements ought not to be left unanswered. Out of much more of the same 
kind I have therefore selected a few jiassages which require comment. 

In an early paper (1921) published in the Tramactions of the Royal Society, no less than four 
sections, or 24 pages, were devoted to an examination of the method of fitting Pearsonian 
curves by moments. The efficiency was found to be uniformly low, save for nearly normal 
curves. It is, therefore, most remarkable that Pearson should have ignored this criticism for 
16 years, and, although ho quotes this pajMjr frequently in his last contributioi^, it is clear 
that he still had no grasp of what had been demonstrated. On p. 46 he says; “... we only 
know of the latter that it makes a certain quantity which Professor Fisher has termed the 
Likelihood a maximum, but what advantages the resulting curve gives to the practical 
statistician remains so far unrevealed.” 

The paper in question claimed, and the claim has never been disputed, that the method 
of maximal likelihood leads to Efficient solutions. To quote a formal definition (p. 310): 
“The criterion of Efficiency is satisfied by those statistics which, when derived from large 
samples, tend to a normal distribution with the least possible standard deviation.” In¬ 
efficient statistics are merely estimates with an unnecessarily low precision. 

Pearson again (p. 45) says: “The very choice of words Maximum lAkelihood seems to beg 
this question, although no rigorous proof can be given of the method ” (sic) “ which bears that 
name, nor is there any proof that we shall reach the minimum y* when we reach the Maximum 
Likelihood.” The reader woidd not easily judge from these words, that in 19211 contrasted 
in detail the method of minimizing y* with that of maximizing the likelihood; that the 
contrast was again exemplified in 1925 (Theory of Statistical Estifnation), from the point of 
view of loss of information; and that my elementary textbook on Statistical Methods has, 
since the second edition (1928), contained a simple worked example illustrating both the 
close agreement of the two methods, and the slight advantage of maximal likelihood. 



29.318 

318 METHOD OF MOMENTS 

No more quotations are needed, though many could be given, to show that Pearson had 
never put himself in a position to criticize the theory of estimation, even by understanding 
what it was about. On practical procedure he is equally misleading (p. 47): 

“We do leani, however, from the Koshal-Fisher paper that we are first to apply the 
‘ inefficient Method of Moments ’ before we can apply the Method of Likelihood, thus doubling 
our labours, and making at least a training in fitting curves by moments desirable as a needful 
preliminary to applying the Likelihood method. “ 

For any approximative process a starting point is necessary. This may be obtained by 
whatever simple procedure is judged effective. Judicious guessing is advocated by Myers, 
and practised by Elderton and Hansmann. In 19211 suggested that the method of moments 
would often be serviceable for a first approach. By “ training in fitting curves by moments “ 
Pearson doubtless means training in the particular methods which he had developed, and 
this would seem to be entirely unnecessary. The simple procedure of equating moments did 
not originate with Pearson, but goes back to Bessel and Gauss, and was largely developed 
by Thiele. For many purposes this process is efficient. Pearson’s contributions lay in applying 
the method to fitting Pearsonian frequency curves, and in his mistaken modification of 
Sheppard’s treatment of grouping. Pearson’s followers seem to be responsible for fathering 
on him the whole process of equating moments, of which it appears he could never appreciate 
the limitations, or master the proper use. 


REFERENCES 

Eldkrton, W. P. and Hansmann, G. H. (1934). “The improvement of curves fitted by the method of moments.” 
J.R. 8UUi9t. Soc. 97, 330-3. 

Fishkr, R. a. (1021). “On the mathematical foundations of theoretical statistics.” Philos. Trans. A, 222, 
309-08. 

- (1925-36). Statistical Methods for Research WorkerSp pp. xii + 339. Edinburgh: Oliver and Boyd. 

—— (1926). “Theory of statistical estimation.” Proc. Can^. phU. Soc. 22, 700-26. 

Koshal, R. S. (1933). “Application of the method of maximum likelihood to the improvement of curves fitted 
by the method of moments.” J.R. statist. Soc. 96, 303-13. 

- (1936). “Application of the method of maximum likelihood to the derivation of efficient statistics for 

fitting frequency curves.” J.R. statist. Soc. 98, 128. 

Koshal, R. S. and Turner, A. J. (1930). “Studies in the sampling of cotton for the determination of fibre- 
properties.” J. Text. Inst. 21, T326-T370. 

Myers, R. J. (1934). “Note on Koshal's method of improving the parameters of curves by the use of the 
method of maximum likelihood.” Ann. math. Statist. 5, 320-3. 

Pearson, K. (1936). “Method of moments and method of maximum likelihood.” Biometrika^ 28, 34-69. 

Thiele, T. N. (1903). Theory of Obsermtions, 143 pp. London: G. and E. Layton. 



30.a 


30 

MOMENTS AND CUMULANTS IN THE SPECI¬ 
FICATION OF DISTRIBUTIONS 


AUTHOR’S NOTE 

The paper was intended as a compact summary of the more useful 
properties of those symmetric functions variously known as mo¬ 
ments, semi-variants, cumulants, etc. To write it was a most en¬ 
joyable collaboration, since so much seemed worth doing. On re¬ 
reading it now appears to be a great deal too compact. Without 
expanding the material to its “natural size'^ (a monograph text¬ 
book), I have in this edition ventured to ease the compression by 
inserting in a few places rather more explicit explanations. Table III 
has been added also for this edition. 


* Reprinted from Extrait de la Revue de VInatitut IntematioruU de Stati^iquef 
Vol. 4, pp. 1-14, 1937. 



30.1 


MOMENTS AND CUMULANTS IN THE SPECIFICATION OF 
DISTRIBUTIONS. 

By E. A. Cornish and R. A. Fisher P. R. S. 


1 . 

The very considerable statistical literature which has grown up on the use 
of the moments of populations and samples, and on other quantities allied to 
these, is rendered confusing by variations in notation and terminology, and by 
the different aims which authors have had in view in using these quantities. 
The following notes aim at clarifying the subject by suggesting a iiniform 
and consistent notation, specifying briefly i^clations ]>ctwoon the different 
quantities ordinarily used, and summarising the which liavc been obtained. 

The distribution of a variable quantity x can be specified by means of 
a frequency function /, often tcnncd the probability integral, specifying the 
total frequency in the population for which the variate is less than an 
assigned value x. For discontinuous distributions / will be a step function, 
increasing discontinuously at, the values of x at which finite fractions of 
the total frequency arc concentrate<i, and remaining constant between these 
values. For certain other distributions / is continuous and differentiable so that 

represents the frequency density in the clement of range dx, or, the 

ordinate of a frequency curve at this point. These are the two common cases, 
but it is also possible for / to be continuous, but not differentiable, and so 
incapable of representation by a frequency curve. 


2 . The Charaoteristio Function. 

In all cases we may define a function of a real variable t in the form 

M {i)c=j 

— 00 

which is known as the characteristic function of the distribution. The absolute 
value of M never exceeds unity for any real value of t. M (t) and M (— t) 
are equal, if real, and conjugate quantities, if complex. If, in the ncigbourhood 
of t = 0, M can be exilandcd in a s(;ries of powers of t, this series will be 


or 


5 : /-<*/. 

r « 0 — » 

^ / 


where is the rth moment of the distribution of x about the origin. The 
characteristic function may, therefore, be spoken of as the moment generating 



2 


30.2 


function. is, of course, the average value of xr and when this is finite, the 
characteristic function is differentiable r times at the origin. 

If ja', is the mean, the factor may be resolved into tlie product 

. eM 

of which the first factor is constant, while the average value of the second 
factor gives a characteristic function referred to the mean of the distribution, 
and therefore formally expansible as a generating function of the moments 
about the mean. The relation between the moine?its about zero and the moments 
about the mean of the distribution may therefore be obtained ])y equating 
coefficients of powers of t in the identity 

. <2 , <=* 14 

1 + /*2 -21 + M;. -af + 

= m'. < + .‘V I- i./, 4', 

giving the series of I’elations 

H"/ — — m'i* » 

/A.i — m's - 3 /a'i 2 m', ’ , 

M 4 -- /*'4 — 4 m ', / t ', 16 --3 m '/ , 


4! 


- -I 


by which the momenta about the mean may be obtained from those about 
any other origin. 


3. The Cumulative Function. 

In studying the distributions of quantities conij)Ounded of ingredients, 
each distributed independently in a known distribution, 1 Laplace was led to 
introduce a function known as tlie cumulative function, which is simply the 
logarithm of the characteristic function. 

If X is distributed in a distri}>ution specified by the frequency element 
d/i, and y is independently distributed in a distribution si)ecifed by the 
element df^, the frequency of the simultaneous occurrence t)f any particular 
pair of values x and y will be d/, df.^, and the characteristic function of the 
sum, a; 4- j/, will be 

caC oi"- 

j j eit ^ y) d/, df^ , 

which is clearly the product of the characteristic functions of x and y 
separately. Consequently, if K = log 3f, be written for the cumulative func¬ 
tion, the cumulative function of x h y is simply the sum of the cumulative 
functions of x and y separately. Evidently this rtdationship holds for any 
number of ingredients and is fundamental in the study of distributions of 
compound quantities. 

The identity of these functions for all values of t carries with it the 
identity of their coefficients when M and K are expressible by power series. 
We are therefore led to recognise the coefficients of the expansion of K in 
powers of t as quantities of peculiar significance in the specification of the 






30.3 


3 


distribution. We shall call these quantities cumulants, denoted by 
#c,, Ka, Kg., and defined by the identity 

i* 

K^t -f- Kg 2'f *4 + . 

1 f* << ) 

=log 11 4- /i.%# -h/t'a Yf TT . I * 


by which the moments about zero may be expressed in terms of cumulants^ 
or vice versa, or by 


( t* t® 

= log I 1 -H /Ag 2! 3! T! 


giving the corresponding relations with the moments about the mean. The 
latter are of great simplicity for 


/la — Kg , 

Ms = Kg , 

^4 “ K4 13 K.f , 

= K« + 10 Kg Kj 

and so on. The numerical coeiffieieiits may be wi’ittcn down at sight for the 
coefficient of Kg® is the number of ways in which 4 objects may be divided 
into two groups of 2 each, and that of k. Kg is the number of ways in which 
5 objects may be divided into a group of 3 and a grouy) of 2. The same rule 
holds generally. 

For, if 

(Pi’"’ .) 


stand for any j)artition of a number r, the coefficient of 
expansion of 


- + K - -h 
**3 3! ' 


y'l 


is seen to bo 


K 


/•a 


_ tr 

( pjr * ^,! (pgD-. TTg ! 


in the 


so that the coefficient of k^* . in fXr is 

_ r! _ 

( pjr * ( P *!)’"®^.!-. 

or the number of ways of distributing r objects into undifferentiated recepta¬ 
cles TT, containing Pj each, Wg containing pg each, and so on. 


3. Average effects of grouping. 

When, by reason of the limited accuracy of instrumental measurements, 
for convenience of record, or to simplify the calculations, variates are grouped 
so that to all values lying in the range x — to x + is assigned the 
conventional value x, the cumulants of the distribxition will be somewhat 
affected. 

















4 


30.4 


To any true value 4, the process of grouping adds a grouping error 
X — where x is the centre of the group in which ^ falls. Knowing the 
group limits, we Icnow also the actual error introduced for each possible 
value of With equal intervals this error will be a periodic function of 
and an exact study of the effects of grouping must involve the phase relation¬ 
ship between the group limits and any such characteristic of the population 
as its mean. With moderately fine grouping, the periodic correeiions are, how¬ 
ever, small; and it is often sufficiently accurate to consider only the average 
effects of grouping, wlien for a given grouping interval, /», the group limits 
are supposed to fall with equal frequency in any equal lengths in which the 
interval may be divided. 

In this case the error of grotiping is distributed with uniform frequency 
over the range from —to f \ h so that its frequency distribution is 

df =:= d.r, — ^ h < X < .1 h 
for all values of i independently. 

The average cumulanls of the grouped disirilmtion will therefore differ 
from those of the original ungrouped distrilnition by the cumulants of the 
grouping error. 

The characteristic function is 


i/. 

M =] I dx 


2 

fit 


sin h ht. 


Hence 


1 + 


12 


2 ! 


__+ 

( 2 r'j ij’ ( 2 r) ! 


K— iny L 

12 2! 120 4! ^ 252 6! 


Since, with group interval smaller than the standard deviation, the higher 
cumulants expressed in groujj units increase rapidly, the effects of grouping 
on them are extremely small, even when the second curnulant is materially 
affected. 

/r* /t® 

The deduction of the coefficients 7 .., — — 77 ,, . from the cumu- 

12 120 2o2 

lants estimated from grouped data is equivalent to She.[)pard’s adjustments 
of the moments. 


4. The symmetric function of a finite sample of observations of 
which the mean value is k^. 

It is easy to see that the condition that the mean value of a symmetric 
function of the observations shall be equal to one of the cumulants, or some 
function of the cumulants, for samples of all sizes, is sufficient to determine 
the symmetric function completely. This property was, however, long over¬ 
looked, and the series of statistics which afford unbiassed estimates of the 
cumulants was, in fact, only introduced in connection with a study of the 
sampling distributions of such estimates, which are found to be greatly simpli- 







30.5 


5 


fied, both in their form and their derivation, by using the appropriate series 
of statistics. Corresponding with any partition 

P == (Pj’^1 Pj, "’a.), S(ir)=~p, /5r(p7r)=W, 

of the partible number w there exists a monomial symmetric function of a 
sample of n observations. If of the n observations tt, are chosen to be raised 
to the power of p^; of the remainder we choose tt^ to be raised to the power 
of Pg, and so on, leaving n —p observations not involved, the product of the 
powers of the chosen observations constitutes a typical term of the symmetric 
function. The number of similar terms that can be formed is 


_nj_ 

! ’>■ 2 !.(n — p) ! 

and the sum of these will be designated by the symbol O (P). 
Thus G (2^1) stands for 

^ 8 8 8 (x\x\xt) 

r*Ml t«=l 


iji which r, s and t may take any three different values from 1 to n. The 
factor i is required since interchange of the values r and s leaves the mono¬ 
mial function unaltered. 

Since the observations are independent, the mean value of any term is 


. 

^ Pi ^ /'I 

so that 

= . 

But we know that 

, 

*-!)'•(?.!)''.. " . 

the summation extending over all partitions of w. 

Hence to obtain a statistic kw such that = k^, it is sufficient to put 

ft _’ Jprr^l)!.."I. cf(p. 

^ — 1 ) . (» —p + 1 ) (p.!)'>(p,!)'>. 


The set of symmetric functions most easily calculated from the obser¬ 
vations, at least if these are grouped, or have many repetitions of the same 
value, are the sums of powers Sqt=^8(.x9). 

From these, corresponding with any partition Q of the partible number w, 
where 

. ) 


it is easy to construct the symmetric function 


S {Q) 8^^ . 

To express O (P) in terms of 8 (Q) we require the bipartitional function 
(P, Q) defined by the identity 

G (P) = SG'* (P,Q) -sr «?). 

Gs is found to be an integer divided by tti ! tt, !.; for values of w 

from 2 to 6 the values of . Gs (P, Q) are tabulated below. 















6 


30.6 


w = 2. 

8(Q) 

P ( 2 ) ( 1 «) 

0 { 2 ) 1 

2 (?(P) —1 1 


w — 3. 

S{Q) 

P (3) (21) (1») 

0{S) 1 

0{2i) —1 1 

6G'(P) 2 —3 1 


tv — 4 

8{Q) 

P (4) (31) (2*) (21*) (1*) 
O(^) 1 , 

6?(31) —11... 
20 ( 2 *) — 1 . 1 ., 
20 ( 21 *) 2 --2 ~ 1 1 
240(1*) —6 8 3 —6 1 


w~5 

8(Q) 

P (5) (41) (32) (31*) (2*1) (21*) (P) 

0(5) 1. 

0(41) —11. 

G{S2) — 1 . 1 .... 
2G(31*) 2—2—1 1 

20 ( 2 * 1 ) 2 — 1—2 . 1 

60(2P) —6 6 5 —3 —3 1 

1200(1'^) 24 —30 —20 20 15 —10 1 


W -n 6 

8(Q) 

(3*) (41*) (321-) 


(2*) (31») (2*1*) 


P 

G{Q) 

<?(51) 

0(42) 

20(3*) 

20(41*) 

0(321) 

60(2») 

60(3P) 

40(2*1*) 

240(21*) 

7200(1“) 


(«) (51) 

1 

- 1 1 

— 1 
— 1 

2 —2 
2 —1 
2 

—6 6 

—6 4 

24 24 

—120 144 


(42) 


1 

1 

— 1 

— 1 —1 

—3 

3 2 

5 2 

—18 —8 
90 40 


1 

1 

—3 —3 
— 1 

12 20 
—90 - 120 


1 

1 

-1 

3 --4 

—15 40 


(21*) 


1 

—6 1 
45 —15 


( 1 “) 


1 


In the expression for the coefficient of O (P) for partitions of a 
fixed number of paits p, is proportional to 


wl 

(P,!)’^‘(P*!)’^* 


« iP) 


where a (P) is the elementary partitional function 

tv\ 

(Piir* 7r,Kp,!)’^,r,r.... 


We may therefore use the property of the function Gs {P,Q), namely that 
Xtt,] 7r,\ . a(P) Gs(P,Q) 

is a (Q) times the coefficient of in the product, for all parts q of Q, of 
the polynomials 

F {q):=x X (—X A)’-(13-1). 

r— O 












30.7 


7 


For the smaller values of q we have 

q F 

1 X 

2 X — x^ 

3 X — 3 a;* -f 2 a;” 

4 X — 7 X* + 12 X* — 6 X* 

5 X —15 X® + 50 x^ — 60 X* *f 24 x* 

6 X —31x* + 180x=*— 390x*+ 360x»— 120x» 

7 X — 63 X* -f 602 x^ — 2100 x* + 3360 x^ — 2520 x® + 720 x\ 


The expression for /c«, thus reduces to 

(-)>-1 ( p — 1)1 


K-- 


n (n — 1).(n — p + 1) 


^pa{Q) SiQ) 


where is given by 

FXi (gj (q^) . = S Up w . 


f- 


The process of simplification may ])c illustrated by determining the coefficient 
of 6*3 63“ in /«,. We have 

Q = (32=), a (<?) = 105, F(3) FH2) -= x» — 5 x* + 9 x*"^ — 7 x« + 2 x^ = S Up a;^ 
so that the coefficient of S362* is 


a(Q) 2 ; 


(p-i)i _ 

n (n — 1).(n — p 4- 1) 


_ ( Si! 3! _ . 

I n (^ 1 ) (n—2) n (r^—1) {n—2) (n—3) 

^ »(»—1) (n—2) (n—3) (n—4) ^ n (»—1) (n—2) (n—3) (n—4) (n—5) 

__6J___210 n _ 

n (u~~i) (u—2) (tv—3) (ti— 4) (w—5) (ti—6) / (u—3) (u—4) (u—5) ( 71 ^— 6 ) 






30.7a 


Since of the partitions of 7 only four are free from unitary parts, the complete 
expression for ki in terms of sums of powers, s^, of deviations from the moan, may 
be quickly found to be 


(n-2)(n - 3)(»-4)(n-5)(n-6) I"'”’ + 

-21(71® + 13n - 18)a6«2 ~ 35(n® + n -f 6)«4«3 + 210(n - 2)«3«i|. 


If deviations are not measured from the mean of the sample, these terms are 
unchanged, but the terms corresponding with the eleven partitions which involve 
unitary parts will also be required. Full expression for the lower values of k 
were given in Moments and Product Momenta of Sampling Distributions, Proc. 
London Math. Soc., Ser. 2, 30, pt. 3 (1928). 


5. Transformation of the characteristic function. 

If $ is any function of x capable of expansion in a power series 
^ (x) =: Ofl -h a,x I- 4- . 

then the characteristic function of is the average value of 
The coefficient of {irYir\ is the average value of 

(flo {- + .)r 

or, if the characteristic function M {t) is differentiable 

!“» + “» d (it) a (ity .I ^ 

Hence, the characteristic function of any function $ (x) may be expressed 
in terms of that of x in the form 

Mt(r) = g , uAui t’O. 

6. The operational properties of the cumulants. 

If the element of frequency is i/dx, and y and its differential coefficients 
vanish at the limits of the range 






8 


dp 


ao.8 




dx 


Hence 


= {—itYM, 


It thiis appears that the cumulative function of the distribution 
df ~e ydx 


differs from that of 
by 

or that the operator 


df == ydx 

{itY ; 


e?7 ^ 


merely increases the gth cumulant by aq. 

_d 

The action of the operator e “ dx merely transforms a function / (») 
into / (x — tti); when actinj; on a frequency function, it thus simply in¬ 
creases the mean by ai, leaving the distribution otherwise unchanged. Similarly 
it appears that 

SS 

^ 2 ! da:* 


simply increases the variance by a^, leaving the mean and other cumulants 
unchanged, as would be done by scattering each element of frequency in a 
normal distribution with variance a,. Similarly, the other operators of the form 


may be used to adjust any of the other cumulants to desired values. 


7. The probability integral of a distribution having given cumulants. 


Since the frequency element of the distribution of a variable i having 
given cumulants k^, k,. Kg, . can be represented formally by 


:6XP 


— (kj —m) 


+ 2 -(x. — 


6 




-h. 


1 


l/2t 




where m and v are the mean and variance of any normal distribution chosen 
for convenience, we may use this choice to simplify the determination of the 
probability integral. 

Sometimes the exact values of and may be used; in this case the 

expression will only involve the higher cumulants k *, k *. More frequently 

the successive cumulants are expressed in power series of the reciprocal of 
some number n, so that the order of magnitude of xr is that of 







30.9 


and K„ the ratio of k, to ’ will be of the order of n when r 


If m and v be chosen to agree with the leading terms of the series for kj 

\r 

exceeds 2, and the expansion takes the form 

2 *^ d4 


dp = 6xp + — 


_1 + , 

B de 


1 


2vv 


where a and c are of order n b and d of order n~\ e of order n ^ , 
/ of order n“’*, and so on. 

Expanding the operator and integrating, we have for the frequency less 

than m + i the expansion, of which the first four adjustment terms have 
been retained, 

P 

+ « aHx + 4 * ^ 72 ) 

— s (^-i- a*la 4- y ab ^2 + "** 2^ 120 


+ 144 1296 ^*^0 


'"'(44 I T ^ i 

ae^B +• TiX TiF i Yao “*" “TTr 


144 


144 


1162 


144 


where 


TW 1296 1728 31104 

5 

p = j zd$ 


and $r is the Hermite polynomial given by 
dr 


^« = 
5«== 


or in full, 

— e + 10^»— 15| 

+ 21 |» —105|» + 105^ 




= irZ, 


i. =e - 1 

^4 - 6 ^*+ 3 

=r —15^ + 45|»- 
i, ==:|» — 28 ^ + 210 ^*- 


15 

4201»+ 105 

— + 36 — 3781* 4 1260 — 945 $ __ 45 4- 630 ^ — 31504- 4725 945 

4- 55 — 990 4- 6930— 17325 H 10395 ^ 





30.10 


10 

8. The expansion for the abscissa correspondin^r to any sfiven level 

of probability. 

Although it is sometimes of interest to work out the actual value of the 
probability corresponding with a given deviation, it is of much more general 
utility to know the values of the deviates corresponding with the assigned 
levels of probability. If now we write x for the normal deviate having the 
same probability integral, the difference | — x may be found by equating 
the expression above for the probability to 

p—(f —x)« + -l- (f ——^ — 

By equating terms of each order of magnitude in succession, as in the 
inversion of power series, we find the polynomials are much simplified, giving 

( — x = a + A c —1) 

4- -i- 6 « — 3 «c « + d (f — 3 f) - gg e= (4 — 7 f) 

- -^ab + „.c —^bc(54* —3)— -1-ob —J) 

f j4) * ~ ® + 3T5 ~ 

— -^cd (11 — 42 f“ -t- 16) + C> (69 $• — 187 f“ + 62) 

-|-6“4+ ^ahci+ — ± bd (7 — 15 f) — gL oe («> — 4) 

4- ^ / (4" — 10 4* + 15 4) a»c*4 + *<=' (36 4’ — 49 4) 

- (54“ —32 4" + 354) (114* —214) 

" 360 ” “ 324 " 

+ c’d (111 4' — 547 4“ + 456 4) — c‘ (948 4“ — 3628 4» + 2473 4). 

In these expressions it sliould be noticed that the polynomials involved 
are in the deviate $. It is in many ways more convenient to use instead 
polynomials in the normal deviate x, corresponding to the probability required. 
This involves an awkward substitution, which may, perhaps, best be carried 
out by observing that if 

4 _* = /’( 4 )=/t» + (« — *)!. 

Uien it 'may be written to the required degree of approximation as 

fix) + fix) fix) + fix) f-ix) + \ rix) fix) 

+ fix) /'•(*) + 4 r(») /'(*) /"(*) + 4^(®) » 

A*) + 4 ^ /*f*> + k . 

*Jin tAt. yt/LA>uCC <r^'t0vu^ a££, tAte, 

2&k/m4 cl JL£4^rnjU%aJtu3C^ .<z 

a£e^ Cu^mAdieL^LtA^ 



30.11 


11 


unchanged, changes all points of fixed probability by the same amount. The 
adjustment to the normal deviate x having the required probability integral is: 

a c (x* — 1 ) 


+ T ^gdCx^ — 3x)— (2x^ — 5x) 

be (x* — 1) + 120 ® — 6 X* + 3) — — 5 X* 4- 2) 

1 


+ 324 c» (12 x^ — 53 x* f 17) 

•g- 6 *x — ^ (^* — 3 x) 4- 74 ^ / — 10 X* 4- 15 x) 


1 


72 (10x» —25x) 


1 

384 


d* (3 X® — 24 x» 4- 29 x) 


— j 3(3 cc (2 X® — 17 X® 4 21 x) 4- 2 ^ (14 x® — 103 x» 4- 107 x) 


- 777 (. c* (252 X® ~ 1688 x® 4- 1511 x). 


For numerical work the polynomials in x re<iuired may very easily be 
tabulated for chosen levels of probability (Table I). In Table II arc given 
the numerical values of the fii*st five Hermitc Polynomials over the range 
of levels of probability chosen for Table I. 

8. 1. The cumulants of the distribution of the test of significance, 2 , and 

the approximate values for different levels of significance. 

For the purpose of obtaining its cumulant^^, we may write 2 , which is 
half the logarithm of the ratio of two estimated variances, in the form 

2 = ^ log xi Vw, — i log x*Vw 2 . 


Now the distribution of x« given by 


d/- 


1 


»i — 2 f 
2 

so that the mean value of 


-ix,^ 




exp liiflogXiVni ( 

= exp t iff log (i xi*)—ifflogin^l 
n-i 4" it — 2 , 

-- exp (— i it log i nj, 

^ I 


whence the cumulative function of z is seen to be 

K =. log ! — log 2* ^ ! — J it log i n. 

- ^log2-»—2 *— - ! — log ! — 4 it log i n, )■ 



TABLE I. 


30.12 


12 













30.13 


13 

The expansion of K in powers of it, may therefore be found from the 
differential coefficients with respect to n of 

, n — 2 , 
log ! . 

Now with sufficient approximation 
2 


log 


\ = l {n — 1) log 1 n — i n + ^ log (2 tt) -f 


i\n 


and its successive differential coefficients are: 


111 1 
log in-- 2 - 


1 


‘2n 2n* "^ 3 a® 

t 

2n* 

1 

3 

12 


From these, writing and r, for the reciprocals of and n, respectively, 
we may obtain the cumulants 

'‘i = — i — »*2 ) — h (»*i* — 

i (r, + r, ) + i + r,«) -I- i(r,® -f r,®), 

k 3 = — ^ (r,® — — (r»® — r^®),*^ 

K*=: (r,® + r,®) + 3 ir,* + r./), 

—3 (r/ —r,*), 

K„== 12(V + r,«). 

If we write a for the sum (r, -f- r^) and 5 for the difference (r, — r*) 
of these reciprocals, and if we choose 

m = 0 , V"iir 


then 


8ar 

<r* 4- 3 8’ 




O =— \/ ^(i8 + 4 Sit). 

6 = i („ + -^) + -g- ( 

c=- j/Tjs + i(3*, + ^i-)j. 

d = („+ .3^) + 68« + 

f = 6 (*• + 10 8* + 



30.14 


14 

For any special values of n, and rtg the six cumulant adjustments can 
be evaluated numerically, and the values substituted in the general formula 
given above. It is, however, also of interest to make the substitution algebraic¬ 
ally and obtain the general form of the z values at different levels of signi¬ 
ficance in terms of a and S. We then have 

— i 8 (I* + 2) 

1 y— I »• 1 ** I 

+ y^'r |^(*' + 3*) + (** + ll®)| 

- ^(x‘ + 9x« + 8) + ^^ (3x‘ + 7x>-16) 

+ I + 15x) + - 2 ^ (!•' + 44x» + 183x) 

+ 28“ 1613 X) j. 

Example, 

When nj = 24 and n, = 60 the 5 % value of z is .26534844. The value 
obtained from the first'approximation and the four correction terms above is 

.2809 1224 — .0196 0643 -f- (.0038 9559 + .0005 7292) — (.0004 8210 — .0000 0206) 
-f (.0000 3805 + .00001886 —.0000 0046) 

== .2653 5073. 

The successive errors in this case diminish progressively, with alternating sign^ 

, 4 Ji 0 unv Ln, 3 

Hisumi: Tjoh cumulanU d’uno distribution sont d^finis et leurs relations avoo les 
moments ordinaires sont indiqu^es. On insists sur IMmportance de ces fonctions dans In 
description des distributions. 

La fonction symdtrlque d’un Cchantillon fini d’observations dont la valeur moyenne 
eat egale au cumulant correspondant de la population est d^fini et une expression con- 
venable pour son evaluation cst developpCe en termes do sommes do puissances des obser¬ 
vations de r^chantillon. 

Les propri6t^s operatives dee cumulants sont discut^es et le d4veloppemont do I’inte- 
ffrale des probability d'une distribution ayant doe cumulants donny en est d^Lv^. Ce 
d4veloppement est transtormd en une expression plus commode, qui donne la vuleur des 
deviations correspondant k un niveau donn6 de probability, en fonction de la dyviation 
normals ayant la mAmb intygrale des probabilitys. Les valours numyriques des polyn6mes 
dans la dyviation normale coneldyrys dans le dernier dyvelo>ppement sont calculCs pour 
certains niveaux choisds de probability. De plus, les valeurs numyriques des 5 premiere 
polyndmes d*Hermite sont donnys pour les mbmes niveaux de probability. Un exemple de 
Temploi des formules dyveloppyes est donny par le calcul de la valeur approximative 
de z pour le niveau de probability de 5 p.cent. 



TABLE III 


Successive 

Degree Terms 

0 .28091224 

1 - 1960643 

2 + 446851 

3 - 48004 

4 + 5645 


Successive 

Successive 

Totals 

Errors 

.28091224 

+.01556380 

.26130581 

- 404263 

.26577432 

+ 42588 

.26529428 

- 5416 

.26535073 

+ 229 



81.354a 


31 

THE WAVE OF ADVANCE OF ADVANTAGEOUS 
GENES 


AUTHOR’S NOTE 

This is an isolated paper which I have not followed up either practi¬ 
cally or theoretically. It seemed essential to examine the properties 
of the differential equation determining gene spread in the simplest 
case, and it was a pleasure to find that the very empirical process 
used for the computation of the fundamental function would work 
successfully. The quantitative solutions foimd were strongly sug¬ 
gestive for comparisons with the observable facts, especially in lit¬ 
toral organisms. 


Reprinted from AimdU of Btugonicmt VoL VII, Pt. IV, pp. 866-MO, 1037. 



31.355 


THE WAVE OP ADVANCE OP ADVANTAGEOUS GENES 

By R. a. fisher, Sc.D., F.R.S. 


I. The problem of gene dispersion 

CoNSiDERa population distributed in a linear habitat, such as a shore line, which it occupies 
with uniform density. If at any point of the habitat a mutation occurs, which happens to be 
in some degree, however slight, advantageous to survival, in the totality of its effects, we 
may expect the mutant gene to increase at the expDnse of the allelomorph or allelomorphs 
previously occupying the same locus. This process will bo first completed in the neighbour¬ 
hood of the occurrence of the mutation, and later, as the advantageous gene is diffused into 
the surrounding population, in the adjacent portions of its range. Supposing the range to be 
long compared with the distances separating the sites of offspring ffom those of their 
parents, there will be, advancing from the origin, a wave of increase in the gene frequency. 
Wo may first on the simplest possible postulates consider the motion of this wave. 

Let p be the frequency of the mutant gene, and q that of its parent allelomorph, which we 
shall suppose to be the only allelomorxih present. Let m be the intensity of selection in favour 
of the mutant gene, supposed independent of p. Suppose that the rate of diffusion per 
generation across any boundary may be equated to 



at that boundary, x being the co ordinate measuring position in the linear habitat. Then p 
must satisfy the differential equation 


di 




( 1 ) 


where t stands for time in generations. 

The constant I; is a coefficient of diffusion analogous to that used in physics. Its use should 
be appropriate in many cases. In all real cases we may expect irregularities due to k varying 
at different points of the range, due to variations in the density of the population, and to 
variation in the selective advantage of the mutant at different places. Further, the means 
of diffusion may involve an unequal drift in opposite directions, so that some parts of the 
range predominate as centres of multiplication and others as centres of extinction. The 
effects of all such complications can only be discussed by reference to the course of events 
when they are absent. The purpose of equation (1) is to specify the simplest possible con¬ 
ditions. 

The use of the analogy of physical diffusion will only be satisfactory when the distances of 
dispersion in a single generation are small compared with the length of the wave. In reality 
diffusion is a complex process, compounded often of the diffusion of gametes, and that of 




31.356 

356 


ADVANTAGEOUS GENES 


larvae, in addition to adult forms; a more exact treatment than that supplied by a simple 
coefficient would involve the interaction of these components, and the stages at which the 
selective advantage was enjoyed. So far as it is applicable, the analogy of physical diffusion, 
therefore, greatly simplifies the problem. 

With respect to the assumed independence of m from p, this is effectively to assume that 
there is no dominance in respect of the selective advantage enjoyed. Apart from its simplicity 
this is also, in the author’s opinion, the most important case to consider, in respect to advan¬ 
tageous mutations occurring in nature. There are, at least, plausible reasons for supposing 
that the common recessiveness of observed mutations is a characteristic of harmful muta¬ 
tions, which have long been appearing in the species with relatively high mutation rates, 
whereas beneficial mutations must, at the time of their establishment, occur with exceedingly 
low mutation rates, and have rarely appeared before in the recent history of the species. On 
these grounds dominance would be expected to be absent, and its absence is made the more 
probable by the fact that in most oases the quantitative effect of beneficial mutations must 
be extremely small. For the same reason the selective intensity m is taken to be a small 
quantity, so that p may be taken to vary continuously with time, and not discontinuously 
from generation to generation. 


II. Waves of stationary form 


If we seek for a solution of (1) representing a wave of stationary form advancing with 
velocity u, we may put jlp 

and obtain the differential equation (2) involving only one independent variable: 


, d^p dp 


( 2 ) 


Since the variable x does not appear explicitly, we may write, for the frequency gradient, 

g = -dpjdXy 


whence 


d^p 

dx^ 


dx ^dp* 


and so find the relation between g and p, 

^dg 


v^-vgfH-mpgr = 0. 


.(3) 


At the point of inflexion dgfdp = 0, and vg = mpq; in advance of this point dgjdp is positive. 
If gjp tends to a limit uosp tends to zero, then u must satisfy the equation, 

ku^ — vu + m = 0, 

a quadratic equation in w, which has real roots only if v® is not less than 4km; but gjp cannot 
tend to zero for vg > mpq, and cannot tend to infinity because v > k.dgjdp. Hence solutions 
only exist for which the velocity of propagation is equal to, or exceeds, 2 






B. A. FISHER 


81.357 

367 


Writing A = V{klm), 

® = 


then equation ( 3 ) may be written 

(‘’+c). 

Or, if Xg = pqz^ 

/»?J + (l-2/>)z-(c+’j+* = 0, .(6) 


where c is any positive number, conventionally taken to be less than 1. 

In the esi)ecially interesting case of minimal velocity, equation ( 5 ) may be written 


.( 5 a) 


dz . ( 1 - 2)2 

pq, =-2pz- — 
dp z 

this case, when c — I, we may call (a). If c lies between 1 and y/\, we have a range of cases, 
which may be called case (6). When c = Vi c), a second case of special interest arises 
with the equation ^2 3 






.(6c) 


and having a velocity of propagation VI times the minimum. Finally, in case (d), c is less 
than Vi* 


III. Particular CASKS 

When = 1 , the only positive value of z for which dzjdp is finito is the positive root of the 

equation / i\ 

+ + = 0 

or 2 = {Vi^* + <jc2 -t- 1) - (c2 -f 1)} = a. 

In the neighbourhood of all other values dzjdp increases inversely to ( 1 —jp), so that no 
other finite value is admissible at p = I. In general, at this extremity 


(1 - p)2d2/dp = 2*-H ^c + ^ j 2 


dp 

1-p 


zdz 




2 *+ 

-log? = ^(^ + 60*+ f)'“*^(z-l3)^ 
which as — logg^->-oo cannot be satisfied for any finite value of A. 


^ / a P \ dz 
\z-a“2-^/a-^’ 

(2:--a)“ . ^ 


The only admissible solution is therefore that for which z = a, at the limit when p = 1. 
When p = 0, we have in case (a) 

^^*-2 <*■*>*• 







31.358 


358 


ADVANTAGEOUS GENES 


For positive values of z, the right-hand side is positive when 


is positive; this is zero when z is 


(2p-l)z»+2z-l 
. -l±V(2p) 

2p-l ■ 


When p > there is only one positive root. This root decreases (as p passes from 0 to 1) from 
1 to {V2— 1), which are the terminal values of z. Since dzfdp is positive when z>Zi (apart 
from a region of higher values when p < i), 2 can never exceed Zi for intermediate values of 
p, for positive values of its derivative can never allow it to pass out of the region of positive 
values, so as to decrease to its final value. Consequently, in the neighbourhood of p = 0, z 
must decrease even more rapidly than For small values of p therefore dz/dp must tend 
to a negative infinity. The differential equation to be satisfied in this region is 


4 = . 

dp -zdz dz dz 

p “ (1-2)* i-z~(rrij*’ 

whence logp = - log (1 -z) + A , 


or 


1-z 


(7a) 


where the constant of integration A is that which carries the solution to the terminal value 
z = V(2 — 1), atp =! 1. 

In case {t ), since dz/dp is positive for all values of p when z lies between c and 1/c, and 
since the terminal value a is less than c, we must take z = c at p O; we need a negative 
value of dz/dp at the terminus, satisfying the equation 



where c* > J. Writing the equation in the form 


,( 66 ) 


dz 1-c* „ 1-c* 

-- Si * 2c-, 

dp pc* cp 

or A (gji-i*,*) , 2 epi-w _ p-i*,*, 

it appears that sp*-***« - 

in which again, the first term on the right-hand side is to be omitted, giving the solution 

c-z-Bp^~i. .(76) 

Since the power of p is less than unity, dz/dp is still infinite at the limit. 








R. A. FISHER 


31.359 

359 

In the special case (c), where 1/c* ^ 2, we find on integration 
z/p = 2clogp + c/p-c, 

or V(i)-z=i>(c-V21ogp), .(7 c) 

tending to zero with p, but still with an infinite derivative. 

2 c® 

Finally in case (d) c-z- - ^p. 

1 — 2c* 

In this case the constant of integraticin is associated with a negligible term. In fact by 
expanding c — 2 in powers of p as 

2 c® 

2 = c-~—p + ^p* + ypa+... 

and substituting, we have successive equations for y, ..., i.e. 

^c»(3-4c*) 

^ (l~3c*)(l-2c*)** 

8c’(5-llc* + Hc«) 

^ ^ (1 - 3c*) (1 - 4c*) (1 - 2c*)® * 

We have thus an expansion for z as a power series in p, a form of expansion which fails at 
the singular values c~* = 3, 4, 5, ...; showing, nevertheless, that when c"* > 2, the solution 
having 2 = c for p = 0 is unique. 

IV. The ambiguity of velocity 

The most striking point about equation (2) is that the velocity of advance of the mutant 
factor appears to be indeterminate. If, for example, any pa^t of the range were filled 
with the mutant form, and the zone of transition were artificially given fre(|uencies with 
the low gradient of gene ratio appropriate to a high velocity, the mutation would spread 
with a higher velocity than if the initial gradient had been higher, and would continue to 
spread indefinitely with this higher velocity so long as uniform conditions were encountered. 
Common sense would, T think, lead us to believe that, though the velocity of advance might 
be temporarily enhanced by this method, yet ultimately, the velocity of advance would 
adjust itself so as to be the same irrespective of the initial conditions. If this is so, equation 
(2) must omit some essential element of the problem, and it is indeed clear that while a 
coefficient of diffusion may represent the biological conditions adequately in places where 
large numbers of individuals ofJboth types are available, it cannot do so at the extreme front 
and back of the advancing wave, where the numbers of the mutant and the parent gene 
respectively are small, and where their distribution must be largely sporadic. 

The effect of chance at the advancing front may be calculated by considering an aggregate 
of discrete particles, which increase in number with a relative growth rate m, as at the wave 
front of our original problem, but are free also to increase in numbers indefinitely in the 
interior of their range. We shall suppose them to be scattered at small unit intervals of time 




31.360 

360 


ADVANTAGEOUS GENES 



-- > 

Fig. 1. r. :>gres8ivo wave of increase of frequency of advantageous genes. A, median of heterozygotes, p =0‘669, 
x=r —0*194. B, point of inflexion, at which the rate of change of gone frequency is greatest, p=0*442, 
x=0*766. C, {X)int at which change in gene frequency is most easily detected, p = 0*377, x= 1*297. The zero 
of the abscissa, x, is the point at which the number of mutant genes in front is equal to the number of 
parent genes behind, p=0*536, x=0. 



Fig. 2. Distribution of heterozygotes in relation to curve of increase of frequency of advanti^;eouB genes. 
Median x= -0*194; mode x* +0*296. 




R. A. FISHER 


31.361 

361 


so that the displacements of the particles at each scattering are distributed independently 
in the normal curve 

then k of our previous notation will correspond to jcr*. 

Whatever may be the original distribution, in one dimension, of the particles, we may 
specify it by means of the characteristic function 

M{1) = Sief'), 

where S stands for summation over all the particjles, and x for the co-ordinate of any one 
of them. The effect of the dispersion of the particles is now merely to multiply M by tlie 
factor at unit intervals of time, while the effect of multijilication of the particles is to 
multiply it by If K stands for log M, it af)pears then that K imjreases lujiformly with time 
at a rate w+ after time T 

K{T) = K{0)-hT{m + laH^). 


If the process be continued for a long time, the form of K will be determined by the ever- 
increasing second term, and the distribution will tend to the normal form, with variance 
and total number proportional to l^et us now draw a line beyond which a large but 
constant number of particles have already advanced, and consider with w hat velocity this 
line will move forward. The proportion, P, of the population beyond tliis line will fall off 
proportionately with but if ^ is the ratio of its distance from the (entre to the standard 

deviation 1 




-if* 


approximately, when P is sinall, whence it appears that differs from 2?aT by a constant, 
and by the logarithm of or, in other words, that ^jy/{'lm,T) tends to unity as a limit. But 
the ratio of the standard deviation to ay/T also tends to unity. HeiUie the distance of our 
arbitrary line from the centre bears a ratio to aTy/{;2m), which tends to unity. Evidently 
the front advances finally with constant velocity given by ay/ {2m), or putting a ~ y/{2k), 
with velocity 2\/{km), which is the minimal velocity consistent with equation (2). The con¬ 
ditions at the front of the wave are the .same in both cases, save that the diffiLsion of a 
continuous variable has been replaced by the random dispersion of discrete particles, and 
when this is done it is .seen that only one velocity of advance is ultimately possible. 


V. Tmk tabulation of tub wave form for c = 1 

It has been shown in »Section III that whereas innumerable solutions of the equation pass 
through the point p = 0, z = I, only one passes through the other terminal point p = I, 
z = \/2-l. Starting from this point therefore, it should bo po.ssible to obtain the numerical 
value of z for each value of p from () to 1, and so to construct the wave. 

The process was carried out in three stages: {a) An expansion of 2 in terms of p w as obtained 



31.362 

362 


ADVANTAGEOUS GENES 


for the immediate neighbourhood of » 1. ( 6 ) At any point on the curve was calculated 

from the differential equation, and from this, and preceding values, the next point on the 
curve was obtained, (c) Since dzjdp tends to infinity as p tends to zero, at a certain stage a 
series of values of p for given z were obtained by interpolation, and the process continued 
using dpjdz instead of dzjdp. 

If z » V2-'l+og + ^* + c?®+... 

when q is small, by substitution in the differential equation, and equating powers of 9 , we find 


2(V2-1) 

6 + 2V2 
2(1W2~1) 

(6 + 2 V 2)*(6 + 2 V 2 ) 


= 0-10582293, 


= 0-0638084, 


16(36 +38 V2) 

® ~ (6 + 2y/if (6 + 2V2j'(7 + 2V2) 


= 00341074, 


while the fourth coefficient is numerically about 0-02417. These suffice to give seven-figure 
accuracy up to ^ = 0-06, or numerically 


Table I, Values of z calctdated from terminal expansion 


p 

z 

1-00 

0-4142 136 

0-99 

0-4162 772 

0-98 

0-4163 618 

0*97 

0-4174 376 

0-96 

0-4186 3482 

0-96 

0-4196 4365 

0-94 

0-4207 6434 


The first seven values for p and z give a sufficient start for the second process. Prom the 
value of z corresponding top = 0-96, the value of dzjdp can be calculated from the differential 
equation dz 


dp 


= i(- + 2gz-2-zy 

M\2 / 


Unit error in z will introduce an error in 1 /z of about 6 , or in — z of about 7; when the divisor 

z 

pg is as small as 0-0384, the error in dzjdp is nearly 170 times as great as that in z, but in the 
opposite direction. As pq increases, however, the error in dzjdp becomes less than 100 times 
that in z, and the increment added to z to give the next value becomes sufficiently accurate. 

The increment may be calculated from the differential coefficient, and its backward 
differences. This if D stand for the operation of differentiation, A for forward differencing, 
and V for backward differencing, 

Az = (e® - l)z = (I + iD + iDa + ...)i)z; 


but 


e-^= 1 -V, 


Z) = V + 1V»+JV*+...; 




R. A. FISHER 


31.363 

363 


whence Az = {1 + | V + + f +...} /)«; 

which is conveniently applied in the form 


1 + |V(1 + JV(1 + /„V{1 + ..., 
where as many as three differences are used. 

To minimize initial errors eight places were used in the values of z for p = 0*96, 0*95 and 
0-94. The scheme of calculation then starts as below: 


Table II. Calculation of z from the differential equation 


p 

2 

dz/dp 

Vdz/dp 

dzjdp 

0-96 

0-4186 3482 

11029 8 



0-95 

0-4190 4305 

11147 2 

1174 

25 

(»-94 

0-4207 04:14 

11207 1 

1 oo t 

22 

0-93 

0-4218 9715 

11389 2 

IZZ 1 

124 8 

27 

0-92 

0-42.30 423 

11514 0 


3 0 

0-91 

0-4242 (KM) 

11041 8 



0-lK) 

0-4253 707 

11771 2 

129 4 

1 6 

38 

0-89 

0-4205 ,544 

11904 4 

loo Z 

1 OPl K 

23 

0-88 

0-4277 510 

12039 9 

!o«> o 

3 1 

0-87 

0-4289 025 

12178 5 

1 #5o I) 1 

30 

0-86 

0-4:i01 874 

12.320 1 

141 0 



The second differences of dzjdp show but slight oscillation. It is not to be supposed that 
the seventh figure in z is always corre(;t, but on trying a false start witli an error — 2 at 
p = ()-96, and +4 at p ~ 0-95, the errors in the subsequent figures alternate, with the 
greatest error — 7 at^) = 0-93, and at = 0-86 arc in exact agreement with the table above. 

As the process is continued dzjdp and its differences increase. Third differences become 
appreciable at about p = ()'7() and fourth differences at about p — 0*35. From p = 0-21 to 
p = 0*15 the interval was reduced to 0-005, from 0-15 to 0-10 to 0-002, and from 0*10 to 0-07 
to 0-001, in order to make the difference series decrease sufficiently rapidly. 

From the values of z between p = 0-070 and 0-078, the values of p corresponding with 
z = 0-663, 0-664, 0-665 and 9-666 were calculated, using initially nine figures, and from these 
the values of dpjdz calculated from the differential equation, thus 


Table III. Valuer of p used in the final stages of tabulation 


z 

V 

dpjdz 



0-003 

0-004 

0-0751(K)2 75 
0-0741304 63 

0-908.591 

0-9.590,53 

-9538 

— 9463 

75 

0-605 

0-0731821 47 

0-949690 

— 9.387 

76 

0-000 

0-0722372 01 

0-940203 

-9313 

74 

0-607 

0-0713017 2 

0-9;)08lK) 

— 9240 

73 

0-008 

0-009 

0-0703754 6 
0-0094584 0 

0-9210.50 

0-912484 

-9100 

74 


31.364 

364 


ADVANTAGEOUS GENES 


The values from z = 0-667 onwards were obtained by calculating the successive differences 
from the differential coefficient. From z = 0-670 to « = 0-790 the interval was 0-002; but 
from that point using fourth differences the interval can be raised to 0-005. The last point 
calculated gave z = 0-876, p = 0-0004017 38, at which stage p is decreasing by more than a 
quarter of its value at each step. 


VI. Numerical applications 


It appears from equation (4) that the gradient is a maximum where g = pgl2X, or where 
z = This occurs whonp = 0-442428, when gX = 0-1233427. 

With a population cross-breeding at random the proportion of heterozygotes for any value 
of p is 2pq. The total number of heterozygotes in any length of habitat in comparison with 

the number of organisms is p 

\2pqdx, 

between the limits considered; writing Xdpfpqz for dx, this is seen to bo merely 

Now d{pqz) — pqdz-^{l — 2p)zdp, 

but by equation (5 a) pqdz - 2pz dp — ^ ^ 5 

hence d.{pqz) = —(1 - z)*} 


2dp 


dp 

z 


Consequently, the indefinite integral j^z 

may bo expressed in the form 2A(2p —pqz). 

Since pgz vanishes when p = 0, or p = 1, the total number of hetorozygotes maintained 
at any time is equal to the population of the length of habitat 

4A = 4VWm) 

proportional to the square root of the coefficient of diffusion, and inversely to the square 
root of the intensity of selection. The relation 



2p—pqz 


also affords a needed check to the accuracy of the values of z obtained, for if the process of 
calculation at any stage had allowed of a systematic drift across the curves satisfying the 
equation, the values of 1/z would have been systematically raised or lowered, and the value 
of the integral would depart from its calculated value. The test may be applied by sections. 



R. A. FISHER 


31.365 

365 


The largest discrepancy found is between 0*4 and 0'5, and amounts to about one part in two 
millions, or nearly 2 J units in the seventh place. The value of p at the point of inflexion may 
thus really bo one part in a million higher than that given above. 

The effective centre of the wave in its advance is the point at which there are as many 
mutant genes in front as there are parent genes behind. The number of parent genes behind 
any point, expressed in terms of the population per unit length of the habitat, is 

J-® J <7 J VZ 

Since z behaves regularly in the neighbourhood = 1, this integral offers no difficulty to 
direct evaluation. I^hc value from p ~ (*omes to 1-640762A. The number of mutant genes 
in advance of any point is more troublesome to ascertain. The form 

Jo 92 

is unsuitable, since the differential coefficient of z is infinite at j!) = 0. Writing 
for 1 /z, it takes the form a( — 2 log 9 — f ^ d( pqz^ , 


which may be used, though with some difficulty. Near the terminus, the most satisfactory 
process is to expand d{pqz) in the form 


giving the third form 


d{ pqz) — pz dq + qd{pz). 



2 log 9 -i?z-( 2 ? + log 9 )z-J (p + log 9 )dzj, 


which may be used with confidence, since p + log 9 is of the order of and becomes 
negligible within the range tabulated. 

The integral to | is found to be 1*244939A, showing, on comparison with the number 
of parent genes behind this point, that the effective centre lies behind the 60 per cent point 
by 0-296823A, or at a place where p has risen to 0-535709. At this point the number of mutant 
genes in front and of parent genes behind are both equal to the total number in the length 
1-39S150A. We take this point as the origin of the co-ordinate x, in Table IV. 

To put the situation concretely, let us suppose a mutation giving a selective advantage of 
1 per cent is spreading along a continuously occupied shore line. Suppose that the standard 
displacement of young from parents in each generation is 100 yards. Then with w = 0-01 
per generation, k = 5000 square yards per generation, and A = = 717 yards. The 

number of heterozygotes is equal to the population of 2868 yards, or rather more than a mile 
and a half of coast, though it is spread over 6 or 8 miles. The rate of advance v = 2y/{mk) is 



31.366 

366 


ADVANTAGEOUS GENES 


Table IV. Values of z and x for each integral 'percentage of p. 
The gradient, -dpjdx, is pqz, when ^ 1 —p 


p 

2 

X 

P 

z 

X 

0 

001 

I'OOOOOOO 

0-784 0738 

00 

7-6061 

0-60 

0-487 3136 

0-2968 

002 

0-749 9772 

6-6817 

0-61 

0-486 2601 

0-2136 

003 

0-726 6592 

6-1182 

0-52 

0-483 2244 

0-1309 

004 

0-708 2499 

6-7027 

0-63 

0-481 2360 

0-0477 

005 

0-693 0407 

6-3693 

0-64 

0-479 2807 

- 0-0360 

006 

0-679 9482 

6-0883 

0-65 

0-477 3604 

- 0 -1203 

007 

0-668 4082 

4-8437 

0-66 

0-476 4729 

- 0-2053 

0-08 

0-658 0636 

4-6261 

0-57 

0-473 6171 

^ 0-2910 

0-00 

0-648 6717 

4-4291 

0-68 

0-471 7921 

- 0-3776 

010 

0-640 0603 

4-2485 

0-69 

0-469 9969 

- 0-4661 

on 

0-632 1012 

4-0811 

0-60 

0-468 2306 

- 0-6636 

012 

0-624 6967 

3-9246 

0-61 

0-466 4921 

- 0 - 64.31 

013 

0-617 7706 

3-7774 

0-62 

0-464 7808 

- 0-7338 

014 

0-611 2612 

3-6380 

0-63 

0-463 0969 

- 0-8358 

015 

0-605 1194 

3-6063 

0-64 

0-461 4365 

- 0-9191 

016 

0-699 3038 

3-3785 

0-66 

0-459 8020 

- 1-0139 

017 

0-593 7802 

3-2668 

0-66 

0-468 1916 

- 1-1103 

018 

0-688 5197 

3-1396 

0-67 

0-456 6047 

- 1-2086 

010 

0-583 4974 

3-0264 

0-68 

0-466 0406 

- 1-3085 

0-20 

0-678 6922 

2-9167 

0*69 

0-463 4987 

- 1-4106 

0-21 

0-574 0856 

2-8102 

0-70 

0-461 9786 

- 1-5147 

0-22 

0-569 6611 

2-7066 

0-71 

0-460 4792 

- 1-6213 

0-23 

a665 4051 

2-6056 

0-72 

0-449 0 ( X)5 

~ 1 - 7.304 

0-24 

0-561 3047 

2-5068 

0-73 

0-447 6417 

- 1-8423 

0-25 

0-667 3488 

2-4101 

0-74 

0-446 1024 

- 1-9572 

0-26 

0-553 6274 

2 - 31.54 

075 

0-444 6820 

- 2-0764 

0-27 

0-549 8316 

22223 

0-76 

0-443 2802 

- 2-1972 

0-28 

0-546 2533 

2-1308 

0-77 

0-441 8964 

- 2-3229 

0*29 

0-542 78.52 

2-0406 

0-78 

0-440 5303 

- 2-4629 

0-30 

0-.539 4208 

1-9518 

0-79 

0-439 1812 

- 2 .5876 

0-31 

0-636 1639 

1-8640 

0-80 

0-437 8492 

- 2-7276 

0-32 

0-532 9792 

1-7773 

0-81 

0-436 63.30 

- 2-8733 

0-33 

0-529 8914 

1-6916 

0-82 

0 - 4,36 2329 

- 3-0265 

0-.34 

0-626 8861 

1-6066 

0-83 

0-433 9492 

~ 3-1849 

0-36 

• 0-623 9689 

1-5224 

0-84 

0-432 6805 

- 3-3626 

0-36 

0-621 1060 

1-4388 

0-85 

0-431 4265 

- 3 .5293 

0-37 

0-618 3237 

1 - 3.558 

0-86 

0-430 1874 

- 3-7166 

0-38 

0-616 6085 

1-2732 

0-87 

0-428 9625 

- 3-9160 

0-39 

0-512 9.576 

1-1911 

0-88 

0-427 7616 

- 4-1296 

0-40 

0-610 3676 

1-1093 

0-89 

0-426 6544 

- 4-3697 

0-41 

0-607 8362 

1-0278 

0-90 

0-426 3706 

- 4-6097 

0-42 

0-606 3607 

0-9465 

0-91 

0-424 2001 

- 4-8837 

0-43 

0-602 9387 

0-8663 

0-92 

0-423 0422 

- 6-1876 

0-44 

0-600 6680 

0-7842 

0-93 

0-421 8972 

- 6-6293 

0-45 

0-498 2466 

0-7031 

0-94 

0-420 7643 

- 6-9205 

0*46 

0-496 9724 

0-6220 

0-96 

0-419 6436 

- 6-3796 

0-47 

0-493 7437 

0-6408 

0-96 

0-418 6348 

- 6-9371 

0-48 

0-491 6686 

0-4694 

0-97 

0-417 4376 

- 7-6502 

0-49 

0-489 4159 

0-3777 

0-98 

0-416 3618 

- 8-6479 

0-60 

0-487 3136 

0-2968 

0-99 

0-416 2772 

- 10-3480 




1-00 

0-414 2136 

00 




R. A. FISHER 


31.367 

367 


about 14 yards per generation, or less than 10 miles in 1000 generations. To spread over a 
habitat of several hundred miles might well take 10,000 or 100,000 generations. In con¬ 
sequence, at any one time, the number of such waves of selective advance, simultaneously 
in progress, must be large. The effective centre in our example is about 210 yards behind the 
50 per cent point, while the steepest gradient of gene ratio, which is the point of most rapid 
genetic change, is about 330 yards in advance of this point. 

At any given spot the rate of change per generation in the proportion of mutant genes is 

vg - 2j)qmZy 

which is less than J per cent at its highest point, whore p is about 44 per cent. Very large 
counts would therefore be needed, supposing the gene to affect any measurable or observable 
characteristics, to detect the change in progress by observations during the course of only 
a few generations. If, for example, both homozygotes and hotorozygotes could be dis¬ 
tinguished with certainty, the sanijiling variance of p, as estimated from the examination 
of n individuals, would bo pql'ln, while that of the difference as estimated from two such 
counts would bo pqjn . 

If n were as high as 10,000, the standard error is thus J per cent when p is 0*5 where the 
rate of change is only 0*244 per cent in each gent'.ration, so that about 5 generations must 
elapse before a significant increase in the jwrcentago could bo observed. The rate of change 
is greatest in relation to its sampling error at the maximum value o(z\/{pq) or when 

i+V(J+P)’ 

which occurs whenp = 0*377, or about 1 297A in advance of the effective centre of the wave, 
where, if the number counted, w, is equated to I the rato of change is just over half (0*6005) 

its standard deviation in each generation. If the change manifests itself in a metrical cha¬ 
racter, to the variance of which other factors, environmental or genetic, contribute, the 
change will be most easily detected at some point between P = 0*377 and P = 0*442. The 
direction of advance might also be indicated from observations at a single epoch by the 
asymmetry of the wave, which is more extended behind than before, or by the skewness of 
the distribution of lieterozygotes, though these features might be expected to be obscured 
by irregularities in the habitat. 

Vri. Appendix on the calculation of special points 
(a) The point of inflexion 

At the point of greatest gradient, since in general 

kg^^-vg + mpq = 0, 

vg = mpq, 

^=i. 


we have the relation 
or, simply, 



31.368 

368 ADVANTAGEOUS GENES 

The relevant tabular values are 


V 

z 

8*z 

X 

8** 

0*46 

0*4969724 

_ 

0*622002 

_ 

0-45 

0*4982466 

472 

0*703128 

-47 

0-44 

0*6006680 

493 

0*784207 

+ 16 

0*43 

0*5029387 

— 

0*866302 

— 


Inverse interpolation for 2 = 0-5 gives p = 0*4424276; whence direct interpolation gives 
X = +0*704626. 


(6) The point at which changes of frequency are moat easily detected 
This will not be at the point where change in frequency is most rapid, because the standard 
error of a comparison of frequencies is not constant, but varies as ^{pq). Wo must therefore 
maximize not zpq, but zy/{pq). This gives 

dz . n 1\ 

dp Hj) q}' 


but, in general, 
hence 


(l~2p)z*-4z + 2 = 0, 

1 


i+vii+py 

The numerical values in this neighbourhood are 


V 

1 

z 

Difference 

8» 

l+V(i+2>) 

0*36 

0*6188439 

0*6211060 

-0*0022621 

_ 

0*37 

0*6174007 

0*6183237 

-0 0009230 

-609 

0.38 

0*6169737 

0*6166086 

+ 0*0003662 

-471 

0*39 

0*6146638 

0*6129675 

+ 0*0016063 

— 


Inverse interpolation for zero difference gives p = 0*377126; whence x is found to be 
1*207092. 

(c) The median of the heterozygotes 

Since the proportion of the heterozygotes behind any point is given by 

p-ipqz, 

we may equate this expression to J. The following values will serve for interpolation: 


P 

V-\pqz 

8» 

X 

8* 

0*64 

0*48047333 

_ 

-0*035989 

_ 

0*66 

0*49092666 

4178 

-0*120301 

-680 

0*66 

0*60142174 

4139 

-0*205293 

1 -752 

0*67 

0*61195822 

— 

-0*291037 

1 “■ 


Inverse interpolation gives p = 0*6686476, x = -0*193757. 






R. A. FISHER 


31.369 

369 


VIII. Summary 

The form is discussed of a steadily progressive wave of getie increase due to the local 
establishment of a favourable mutation, for the case of a uniform linearly distributed 
population. 

The ecpiation obtained by the analogy of physi(^al diffusion is found to be consistent with 
all velocities of advance above a (‘ertain lower limit. 

The indeterminacy of velocity is re.solvcd by comparison with the properties of multi¬ 
plying aggregates of particles, constantly subjected to random si^attering. It appears that 
the actual velocity of advance must bo the minimum compatible with the differential 
equation. 

This velocity is [)roportional to the square root of the intensity of selective advantage 
and to the standard deviation of scattering in each generation, or to the square root of the 
diffusion coefficient when time is measured in generations. It may be expressed in the form 

V = ay/(2m)y 

or w = 2\/{A:/a), 

where m is the selective advantage, a the standard deviation of scattering, and k the diffusion 
coefficient. 

The “length” of the wave, or the distance between any two assigned gene ratios, is 
proi)ortional to x = 

which may conveniently be taken as the unit of length. 

The form of the wave is tabulated so as to show, for each percentage of the frequency of 
the mutant gene, the value of the gradient of gene ratio and the position at which this 
percentage occurs relative to the effective centre of the wave, i.e. to the point in advance of 
which there are as many mutant genes as there are j)arent genes behind it. 

Stages of special interest which occur in succession at each point reached are 


- - - 

Mutant 

Distance in 


RcncH % 

advanci! of cxMitre 

'rht* ]K>int at which chajiK^'S ia frequency 

37*7 

1-30A 

arc most easily dotccUMl 



Tho ]X)int of inflexion 

44-2 

0-7flA 

Kquality of pene ratio, nio<lo of hctcrozygotea 

6t>0 

0-30A 

KlTeotivo centre of wav(» 

63'fl 

0 

Median of hctcrozygotes 


-0-19A 




32.178a 


3 ^ 

THE USE OF MULTIPLE MEASUREMENTS IN 
TAXONOMIC PROBLEMS 


AUTHOR’S NOTE 

This was written to embody the working of a practical numerical 
example arising in plant taxonomy, in which the concept of a dis¬ 
criminant function seems to be of immediate service. 


Reprinted from AnruUs of Ettgenicat Vol. VII, Pt. II, pp. 179-188, 1936. 



32.179 


THE USE OF MULTIPLE MEASUREMENTS IN 
TAXONOMIC PROBLEMS 

By R. a. FISHER, Sc.D., F.R.S. 

I. Discriminant functions 

When two or more populations have been measured in several characters, x^, 
special interest attaches to certain linear functions of the measurements by which the 
populations are best discriminated. At the author’s suggestion use has already been made 
of this fact in craniometry (a) by Mr E. S. Martin, who has applied the principle to the 
sex differences in measurements of the mandible, and (b) by Miss Mildred Barnard, who 
showed how to obtain from a series of dated series the particular compound of cranial 
measurements showing most distinctly a progressive or secular trend. In the present paper 
the application of the same principle will be illustrated on a taxonomic problem; some 
questions connected with the precision of the processes employed will also be discussed. 

II. Arithmetical procedure 

Table I shows measurements of the flowers of fifty plants each of the two species Iris 
setosa and /. versicolor^ found growing together in the same colony and measured by 
Dr E. Anderson, to whom I am indebted for the use of the data. Four flower measure¬ 
ments are given. We shall first consider the question: What linear function of the four 
measurements X^\x^ + \x^ + \x^ + Kx^ 

will maximize the ratio of the difference between the specific means to the standard 
deviations within species? The observed means and their differences are shown in Table II. 
We may represent the differences by , where p = 1, 2, 3 or 4 for the four measurements. 

The sums of squares and products of deviations from the specinc means are shown in 
Table III. Since fifty plants of each species were used these sums contain 98 degrees 
of freedom. We may represent these sums of squares or products by 8^, where p and q 
take independently the values 1, 2, 3 and 4. 

Then for any linear function, X, of the measurements, as defined above, the difference 
between the means of X in the two species is 

Ajdj "H -l-Ajdj 4- 

while the variance of X within species is proportional to 

8^i i 

p-lg-l 

The particular linear function which best discriminates the two species will be one for 



32.180 

180 MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS 


Table I 


Iris aelosa 

Iris versicolor 

Iris vvrginica 

Sepal 

length 

Se^l 

width 

Petal 

length 

Petal 

width 

Sepal 

length 

Sepal 

width 

Petal 

length 

Petal 

width 

Sepal 

length 

Sepal 

width 

Petal 

length 

Petal 

width 

61 

3-6 

1-4 

0-2 

7-0 

3-2 

4-7 

1-4 

6-3 

3-3 

6-0 

2-6 

4-9 

3-0 

1-4 

0-2 

6-4 

3-2 

4-6 

1-6 

5-8 

2-7 

6-1 

1-9 

4-7 

3-2 

1-3 

0-2 

6-9 

3-1 

4-9 

1-6 

7-1 

3-0 

6-9 

2-1 

4*6 

3-1 

1-6 

0-2 

6-6 

2-3 

4-0 

1-3 

6-3 

2-9 

6-6 

1-8 

60 

3-6 

1-4 

0-2 

6-5 

2-8 

4-6 

1-6 

6-5 

3-0 

6-8 

2-2 

6-4 

3-9 

1-7 

0-4 

6-7 

2-8 

4-5 

1-3 

7-6 

3-0 

6-6 

2-1 

4*6 

3-4 

1-4 

0-3 

6-3 

3-3 

4-7 

1-6 

4-9 

2-5 

4-6 

1-7 

60 

3-4 

1-6 

0-2 

4-9 

24 

3-3 

1-0 

7-3 

2-9 

6-3 

1-8 

4*4 

29 

1-4 

0-2 

6-6 

2-9 

4-6 

1-3 

6-7 

2-6 

6-8 

1-8 

4-9 

3-1 

1-5 

0-1 

6-2 

2-7 

3-9 

1-4 

7-2 

3-6 

6-1 

2-6 

64 

3-7 

1-6 

0-2 

6-0 

2-0 

3-6 

1-0 

6-6 

3-2 

6-1 

2-0 

4-8 

3-4 

1-6 

0-2 

6-9 

3-0 

4-2 

1-5 

6-4 

2-7 

6-3 

1-9 

4-8 

3-0 

1-4 

0-1 

6-0 

2-2 

4-0 

1-0 

6-8 

3-0 

6-6 

2-1 

4-3 

3-0 

1-1 

0-1 

6-1 

2-9 

4-7 

1-4 

6-7 

2-6 

5-0 

2-0 

5-8 

4-0 

1-2 

0-2 

6-6 

2-9 

3-6 

1-3 

6-8 

2-8 

6-1 

2-4 

6-7 

4-4 

1-6 

0-4 

6-7 

3-1 

4-4 

1-4 

6-4 

3-2 

6-3 

2-3 

6-4 

3-9 

1-3 

0-4 

6-6 

3-0 

4-6 

1-6 

6-6 

3-0 

5-6 

1-8 

61 

3-6 

1-4 

0-3 

6-8 

2-7 

4-1 

1-0 

7-7 

3-8 

6-7 

2-2 

6-7 

3-8 

1-7 

0-3 

6-2 

2-2 

4-6 

1-6 

7-7 

2-6 

6-9 

2-3 

61 

3-8 

1-6 

0-3 

6-6 

2-6 

3-9 

1-1 

60 

2-2 

6-0 

1-6 

6-4 

3-4 

1-7 

0-2 

6-9 

3-2 

4-8 

1-8 

6-9 

3-2 

6-7 

2-3 

61 

3-7 

1-6 

0-4 

6-1 

2-8 

4-0 

1-3 

6-6 

2-8 

4-9 

2-0 

4-6 

3-6 

1-0 

0-2 

6-3 

2-6 

4-9 

1-6 

7-7 

2-8 

6-7 

2-0 

61 

3-3 

1-7 

0-6 

6-1 

2-8 

4-7 

1-2 

6-3 

2-7 

4-9 

1-8 

4-8 

3-4 

1-9 

0-2 

6-4 

2-9 

4-3 

1-3 

6-7 

3-3 

5-7 

2-1 

60 

3-0 

1-6 

0-2 

6-6 

3-0 

44 

1-4 

7-2 

3-2 

6-0 

1-8 

60 

3-4 

1-6 

0-4 

6-8 

2-8 

4-8 

1-4 

6-2 

2-8 

4-8 

1-8 

6-2 

3-6 

1-6 

0-2 

6-7 

3-0 

6-0 

1-7 

6-1 

3-0 

4-9 

1-8 

6-2 

3-4 

1-4 

0-2 

6-0 

2-9 

4-6 

1-6 

6-4 

2-8 

6-6 

2-1 

4-7 

3-2 

1-6 

0-2 

6-7 

2-6 

3-6 

1-0 

7-2 

3-0 

6-8 

1-6 

4-8 

3-1 

1-6 

0-2 

6-6 

2-4 

3-8 

1-1 

7-4 

2-8 

6-1 

1-9 

6-4 

3-4 

1-6 

0-4 

6-6 

2-4 

3-7 

1-0 

7-9 

3-8 

6-4 

2-0 

6-2 

4-1 

1-6 

0-1 

6-8 

2-7 

3-9 

1-2 

6-4 

2-8 

6-6 

2-2 

6-6 

4-2 

1-4 

0-2 

6-0 

2-7 

6-1 

1-6 

6-3 

2-8 

6-1 

1-6 

4'9 

3-1 

1-6 

0-2 

6-4 

30 

4-6 

1-6 

6-1 

2-6 

6-6 

1-4 

6-0 

3-2 

1-2 

0-2 

60 

3-4 

4-6 

1-6 

7-7 

3-0 

6-1 

2-3 

6-6 

3-6 

1-3 

0-2 

6-7 

3-1 

4-7 

1-6 

6-3 

3-4 

6-6 

2-4 

4-9 

3-6 

1-4 

0-1 

6-3 

2-3 

4-4 

1-3 

6-4 

3-1 

6-6 

1-8 

4-4 

3-0 

1-3 

0-2 

6-6 

3-0 

4-1 

1-3 

6-0 

3-0 

4-8 

1-8 

61 

3-4 

1-5 

0-2 

6-6 

2-6 

4-0 

1-3 

6-9 

3-1 

6-4 

2-1 

60 

3-6 

1-3 

0-3 

6-5 

2-6 

4-4 

1-2 

6-7 

3-1 

6-6 

2-4 

4-6 

2-3 

1-3 

0-3 

6-1 

3-0 

4-6 

1-4 

6-9 

3-1 

6-1 

2-3 

4-4 

3-2 

1-3 

0-2 

6-8 

2-6 

4-0 

1-2 

6-8 

2-7 

6-1 

1-9 

6-0 

3-6 

1-6 

0-6 

6-0 , 

2-3 

33 

1-0 

6-8 

3-2 

6-9 

2-3 

6-1 

3-8 

1-9 

0-4 

6-6 

2-7 

4-2 

1-3 

6-7 

3-3 

6-7 

2-6 

4-8 

3-0 

1-4 

0-3 

6-7 

3-0 

4-2 

1-2 

6-7 

3-0 

6-2 

2-3 

61 

3-8 

1-6 

0-2 

6-7 

2-9 

42 

1-3 

6-3 ! 

2-6 

6-0 

1-9 

4-6 

3-2 

1-4 

0-2 

6-2 

2-9 

4 -^ 

1-3 

6-6 

3-0 

6-2 

2-0 

6-3 

8-7 

1-6 

0-2 

6-1 

2-6 1 

3-0 

1-1 

6-2 

3-4 

6-4 

2-3 

60 

3-3 

1-4 

0-2 

6-7 

2-8 

4-1 

1-3 

6-9 

3-0 

6-1 

1-8 





R. A. FISHER 


32.181 

181 


Table II. Observed means for tuyo species and their difference {cm.) 



Versicolor 

Setom 

Difference (F—5) 

Sepal length (x^) 

6-93G 

6-000 

0-930 

Sepal width (*,) 

2-77(> i 

3-428 

-0-668 

Petal length {z^) 

4-200 

1-402 

2-798 

Petal width {x^) 

1-320 

0-240 

1-080 


Table III. Sums of squares and products of four measurements^ within species (crn.^) 



Sepal length 

Sepal width 

Petal length 

Petal width 

Sepal length 

Sepal width 

19-1434 

9-0360 

9-7034 

3-2394 

9-0350 

11-8068 

4-6232 

2-4746 

Petal length 

Petal width 

9-7634 

4-0232 

12-2978 

3-8794 

3-2394 

2-4740 

3-8794 

2-4604 


which the ratio D^jS is greatest, by variation of the four coefficients A^, Ag, Ag and A 4 
independently. This gives for each A 


D{^^,dl) ^,dS] , 


dS_ S df) 
0 A ~ I) dX* 


where it may be noticed that SjI) is a factor constant for the four unknown coefficients. 
Consequently, the coefficients required are proportional to the solutions of the equations 

*^13 ^ "f ^^*33 ^3 + *^'34 ^4 “ ^4 > 

*S’,4 A| 4 - <S'24A3+ <S^34A3 + >^^44 A4 = d ^ . 

If, in turn, unity is substituted for cacli of the diflerences and zero for the others, the 
solutions obtained constitute the matrix of multipliers reciprocal to the matrix of 8; 
numerically we find: 


Table IV. Matrix of multipliers reciprocal to the sums of squares and products 
within species (cm.~*) 



Sepal length 

Sepal width 

Petal length 

Petal width 

Sepal length 
Sepal width 

Petal length 
Petal width 

0-1187161 

-0-0608666 

-0-0816168 

0-0396360 

-0-0668666 

0-1462736 

0-0334101 

-0-1107629 

-0-0816168 

0-0334101 

0-2193614 

-0-2720206 

0-0396350 

-0-1107629 

-0-2720200 

0-8946606 


These values may be denoted by for values of p and q from 1 to 4. 













32.182 

182 MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS 


Multiplying the columns of the matrix in Table IV by the observed differences, we have 
the solutions of the equation (1) in the form 

A® -0 0311611, -0 1839076, Ag^ +0-2221044, A*^ +0-3147370, 

BO that, if we choose to take the coefficient of sepal length to be unity, the compound 
measurement required is 

X = + 6- 9037x8 - 7- 1299xg - 10- lOSOx^. 

If in this expression we substitute the values observed in setoaa plants, the mean, as found 
from the values in Table I, is 

6-006 + (3-428) (6-9037)-(1-462) (7-1299)-(0-246) (10-1036) = 12-3346 cm.; 
for versicolor, on the contrary, we have 

6-936 + (2-770) (6-9037)-(4-260) (7-1299)-(1-326) (10-1036) =-21-4816 cm. 

The difference between the average values of the compound measurements being thus 
33-816 cm. 

The distinctness of the metrical characters of the two species may now be gauged by 
comparing this difference between the average values with its standard error. Using the 
values of Table III, with the coefficients of our compound, we have 

19-1434 +(9-0366) (6-9037)-(9-7634) (7-1299)-(3-2394) (10-1036) = -29-8608, 

9-0366+(11-8668) (6-9037)-(4*6232) (7*1299)-(2-4746) (10-1036) = 21-1224, 

9-7634 +(4*6232) (6-9037) - (12-2978) (7-1299) - (3-8794) (10-1036) = -89-8206, 

3-2394+(2-4746) (5-9037)-(3-8794) (7-1299)-(2-4604) (10-1036) = -34-6699, 

and finally, 

-29-8608+(21*1224) (5-9037) + (89*8206) (7-1299) +(34-6699) (10*1036) = 1085-5522. 

The average variance of the two species in respect of the compound measurements may 
be estimated by dividing this value (1085-5522) by 95; the variance of the difference 
between two means of fifty plants each, by dividing again by 25. For single plants the 
variance is 11*4269, so that the mean difference, 33-816 cm., between a pair of plants of 
different species has a standard deviation of 4*781 cm. For means of fifty the same average 
difference has the standard error 0-6761 cm., or only about one-fiftieth of its value. 

Ill, Interpretation 

The ratio of the difference between the means of the chosen compound measurement 
to its standard error in individual plants is of interest also in relation to the probability 
of misclassification, if the specific nature were judged wholly from the measurements. 
For reasons to be discussed later we shall estimate the variance of a single plant by dividing 
1085-6622 by 95, giving 11-4269 cm,* for the variance, and 3-3804 cm. for the standard 
deviation. Supposing that a plant is misclassified, if its deviation in the right direction 



32.183 

R. A. FISHER 183 

exceeds half the difference, 33*816 cm., between the species, the ratio to the standard as 
estimated is 6*0018. 

The table of the normal distribution {Statistical Methods^ Table II) shows that a ratio 
4*89164 is exceeded five times in a million, and 6*32672 only once in two million trials. 
By logarithmic interpolation the frecpicncy appropriate to a ratio 5*0018 is about 2*79 per 
million. If the variances of the two species are unequal, this frequency is somewhat 
overestimated by this method, since wc ought to divide the specific difference in proportion 
to the two standard deviations, and for constant sum of variances the sum of the standard 
deviations is greatest when they are eijual. Wo may, therefore, at once conclude that if 
the measuroments ^ire^n^ r^^uirmally tlistributed the probability of misclassification, 
using the compound mov^ e men t only is le.ss than three per million. 

The same ratio is of interest from another aspect. If the chosen compound X is analysed 
in respect to its variation within and between species, the sum of squares between species 
must be 25/)*. Numerically we have, therefore, 


Table V. Analysis of variance of the chosen compound X, 
hettveen and within species 



Degrees of 
freeilom 

Sum of squares 

Between bixhuch 

4 

28688-06 

Within species 

95 

1086*56 

Total 

99 

29673*60 


Of the total only 3*6583 per cent, is within species, and 96*3417 per cent, between species. 
The compound has been chosen to maximize the latter percentage. Since, in addition to 
the specific means, we have used three adjustable ratios, the variation within species 
must contain only 95 degrees of freedom. 

In making up the variate X, we have multiplied the original values of A by -32*1018 
in order to give to the measurement sepal length the coetticient unity. Had we used the 
original values, the analysis of Table V would have appeared as: 


Table VI. Analysis of variance of the crude compound X, 
between and within specie,s 



Degrees of 

Sum of 



fretHlom 

squares 


Between species 

4 

27*74160 

= 25/)* 

Within species 

96 

1*05341 

= D =8 

Total 

99 

28*79501 

D{l+25D) 


On multiplying equations ( 1 ) by A,, Ag, A 3 and A^ and adding, it appears that 
S = YM = Dy the specific difference in the crude compound X. The proportion (3*6 per cent.) 
of the sum of squares within species could therefore have been found simply as 1/(1 + 25/)). 





32.184 

184 MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS 


IV. The analogy of partial regression 


The analysis of Table VI suggests an analogy of some interest. If to each plant were 
assigned a value of a variate y, the same for aU members of each species, the analysis of 
variance of y, between the portions accountable by linear regression on the measurements 
Zi, and the residual variation after fitting such a regression, would be identical 

with Table VI, if y were given appropriate equal and opposite values for the two species. 
In general, with different numbers of representatives of the two species, rij and if 

the values of y assigned were w, , — 

——— and - - , 

+ »i + na 

differing by unity, the right-hand sides of the equations for the regression coefficients, 
corresponding to equation (1), would have been 

n^ + n^ ^ * 

where is the difference between the means of the two species in any one of the measure¬ 
ments. The typical coefficient of the left-hand side would be 




Til rig 


d„d„ 


Transferring the additional fractions to the right-hand side, we should have equations 
identical with (1), save that the right-hand sides are now 


^ d, (l-SA'd), 

where A' stands for a solution of the new equations; hence 

(l-SA'djA, 

multiply these equations by d and add, so that 


SA'd - SAd (1 - 2A'd), 

»!+«« 

(l-£A'd)(n-^l^ SAd\ = l, 
V r^ + n, } 


and so in our example 


l-SA'd=: 


1 


The analysis of variance of y is, therefore, 




Table VII. Analysis of variance of a variate y determined exclttsively by the species 



Degrees of 

Sum of 



fx^om 

squares 


B^^rewnon 

4 

240864 


Remainder 

96 

0-9146 


Total 

99 

26-0000 





R. A. FISHER 


32.185 

185 


The total S (t/^) is clearly in general " ; the portion ascribable to regression is 

V 2 2627) 

Wi + Tig 1 + 257)' 

In this method of presentation the appropriate allocation of the degrees of freedom 
is evident. 

The multiple correlation of y with the measurements , ^4 is given by 

= 260/1 + 257). 

V. Test of significance 

It is now clear in what manner the specific difference may be tested for significance, so 
as to allow for the fact that a variate has been chosen so as to maximise the distinctness of 
the species. The regression of y on the four measurements is given 4 degrees of freedom, 
and the residual variation 96; the value of z calculated from the sums of squares in any 
one of Tables V, VI or VII is 3*2183 or 

J (log 95 — log 4 + log 25 + log 7)), 

a very significant value for the number of degrees of freedom used. 

VT. Applications to the theory of allopolyploidy 

We may now consider one of the extensions of this procedure which are available when 
samples have been taken from more than two populations. The sample of the third species 
given in Table I, Iris virginica, differs from the two other samples in not being taken from 
the same natural colony as they were—a circumstance which might (jonsiderably disturb 
both the mean values and their variabilities. It is of interest in association with /. setosa 
and /. versicolor in that Raridoph (1934) has jiscertained and Anderson has confirmed that, 
whereas 7. setosa is a “diploid” species with 38 chromosomes, 7. virginica is “tetraploid”, 
with 70, and /. versicolor, which is intermediate in three measurements, though not in 
sepal breadth, is hexaploid. He has suggested the interesting possibility that 7. versicolor 
is a polyploid hybrid of the two other sjMicics. We shall, therefore, consider whether, when 
we use the linear compound of the four measurements most appropriate for discriminating 
three such species, the mean value for 7. versicolor takes an intermediate value, and, if so, 
whether it differs twice as much from 7. seiosa as from 7. virginica, as might be expected, 
if the effects of genes are simply additive, in a hybrid between a diploid and a tetraploid 
species. 

If a third value lies two-thirds of the way from one value to another, the three deviations 
from their common mean must bo in the ratio 4: 1 : - 5. To obtain values corresponding 
with the differences between the two species we may, therefore, form linear compounds of 
their mean measurements, using these numerical coefficients. I'he results are shown in 
Table VIII where, for example, the value 7*258 cm. for sepal length is four times the mean 



32.186 

186 MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS 


sepal length for I. virginica plus once the mean sepal length for /. versicolor minus five 
times the value for /. setosa. 

Table VIII 


Means 


iSr 

M 


Iria virginica. Fifty plants 

6-688 

19-8128 

4-6944 

14-8612 

2-4066 

2-974 

4-6944 

6-0962 

3-4976 

2-3338 

6-662 

14-8612 

3-4976 

14-9248 

2-3924 

2-026 

2-4066 

2-3338 

2-3924 

3-6962 

Iri» versicolor. Fifty plants 

6-936 

13-0662 

4-1740 

8-9620 

2-7332 

2-770 

4-1740 

4-8260 

4-0600 

2-0190 

4-260 

8-9620 

4-0600 

10-8200 

3-6820 

1-326 

2-7332 

2-0190 

3-6820 

1-9162 

Iris setosa. Fifty plants 

6-006 

6-0882 

4-8616 

0-8014 

0-6062 

3-428 

4-8616 

7-0408 

0-6732 

0-4666 

1-462 

0-8014 

0-6732 

1-4778 

0-2974 

0-246 

0-6062 

0-4666 

0-2974 

0-6442 

4w + ve—6«e 

7-268 

482-2660 

199-2244 

266-7762 

63-8778 

-2-474 

199-2244 

262-3842 

74-3416 

60-7498 

19-168 

266-7762 

74-3416 

286-6618 

49-2964 

8-200 

63-8778 

60-7498 

49-2964 

74-6604 


Since the values for the sums of squares and products of deviations from the means 
within each of the three species are somewhat different, we may make an appropriate matrix 
corresponding with our chosen linear compound by multiplying the values for I. virginica 
by 16, those for I. versicolor by one and those for I. setosa by 26, and adding the values 
for the three species, as shown in Table VIII. The values so obtained will correspond with 
the matrix of sums of squares and products within species when only two populations 
have been sampled. 

Using the rows of the matrix as the coefficients of four unknowns in an equation with 
our chosen compound of the mean measurements, e.g. 

482-2660Ai+ 199-2244Aa+ 266-7762As + 63-8778A4 = 7-268, 
we find solutions which, when multiplied by 100, are 

Coefficient of sepal length — 3-308998 
sepal breadth — 2-769132 
petal length 8-866048 
petal breadth 9-392661 
defining the compound measurement required. 




32.187 

R. A. FISHER 187 

It is now easy to find the means and variances of this compound measurement in the 
three species. These are shown in the table below (Table IX): 


Table IX 



Mean 

Sum of 
squares 

Mean 

square 

Standard 

deviation 

/. virginim 

38-24827 

923-7958 

18-8630 

4-342 

I. versicolor 

22-93888 

873-5119 

17-8268 

4-222 

I, setosa 

-10-75042 

292-8958 

6-9775 

2-444 


From this table it can be seen that, whereas the difference between I. aetosa and I. versi¬ 
color^ 33-69 of our units, is so great compared with the standard deviations that no 
appreciable overlapping of values can occur, the difference between I. virginica and 
I. versicolor^ 15-31 units, is less than four times the standard deviation of each species. 

The differences do seem, however, to be remarkably closely in the ratio 2:1. Compared 
with this standard, /. virginica would appear to have exerted a slightly preponderant 
influence. The departure from expectation is, however, small, and we have the material 
for making at least an approximate test of signiflcance. 

If the differences between the means were exactly in the ratio 2:1, then the linear 
function formed by adding the moans with coetticients in the ratio 2 :-3 : 1 would be zero. 
Actually it has the value 3-07052. The sampling variance of this compound is found 
by multiplying the variances of the three species by 4, 9 and 1, adding them together 
and dividing by 50, since each mean is based on fifty plants. This gives 4-8365 for the 
variance and 2-199 for the standard error. Thus on this test the discrepancy, 3-071, is 
certainly not significant, though it somewhat exceeds its standard error. 

In theory the test of significance is not wholly exact, since in estimating the sampling 
variance of each species we have divided the sum of squares of deviations from the mean 
by 49, as though these deviations had in all 147 degrees of freedom. Actually three degrees 
of freedom have been absorbed in adjusting the coefficients of the linear compound so as 
to discriminate the species as distinctly as possible. Had we divided by 48 instead of by 
49 the standard error would have been raised by a trifle to the value 2-231, which would 
not have affected the interpretation of the data. This change, however, would certainly 
have been an over-correction, since it is the variances of the extreme species /. virginica 
and I. setosa which are most reduced in the choice of the compound measurement, while 
that of I. versicolor contributes the greater part of the sampling error in the test of 
significance. 

The diagram. Fig. 1, shows the actual distributions of the compound measurement 
adopted in the individuals of the three species measured. It will be noticed, as was 
anticipated above, that there is some overlap of the distributions of I. virginica 
I. versicolor, so that a certain diagnosis of these two species could not be based solely on 





32.188 

188 MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS 


these four measurements of a single flower taken on a plant growing wild. It is not, 
however, impossible that in culture the measurements alone should afford a more complete 
discrimination. 


I III Menu ■ml iwo>ihMi 

I-^ wei,h.edn«a 





Fig. 1. Frequency hiatograms of the diBcriminsting linear function, for three species of /m. 


REFERENCES 

Randolph, L. F. (1934). “Chromosome numbers in native American and introduced species and cultivated 
varieties of Iris.” BvU. Amer, Iris 8oc. 62, 61-66. 

Andbrson, Edoar (1936). "The irises of the Caspe Peninsula.” BiUl. Amer. Iris 8oc. 69, 2-6. 

-(1936). “The species problem in Iris." Ann. Mo. bot. Odn. (in the Press). 



S3.375a 


THE STATISTICAL UTILIZATION OF MULTI¬ 
PLE MEASUREMENTS 


AUTHOR’S NOTE 

Papers 33 and 34 attempt to bring under a common point of view 
diverse researches, of which the most important had been initiated 
by Hotelling in the United States and by Mahalanobis in India. 
The author’s own researches had approached essentially the same 
problems by a technique known as discriminant functions. The re¬ 
sults have here been compared in a common notation, and the first 
steps taken to advance the theory of discriminant functions so far 
as to test their significance and the collinearity or coplanarity of ob¬ 
served aggregates. 

The slip alluded to on page 34.423 has been corrected in the pres¬ 
ent edition of Paper 33, and the treatment brought into line with 
that of Paper 34, 


Reprinted from AnndU of EugenicSt Vol. VIII, Pt. IV, pp. 376-386, 1938. 



38.376 


THE STATISTICAL UTILIZATION OF MULTIPLE 
MEASUREMENTS 

By R. a. fisher 


1. Introductory 


It has been shown (Barnard, 1935; Fairfield Smith, 1930; Fisher, 1936) that a set of multiple 
measurements may be used to provide a discriminant function, linear in the observations, 
having the property that, better than any other linear function, it will discriminate between 
any chosen classes such as taxonomic^^J^^jS^ sexes, plants giving more or less 

desirable progeny, and so on. Its use in matopwd psychology has been illustrated by Wallace 
& Travers (1938). 

In discussing the application of this process to a taxonomic problem, I was led to point 
out its formal analogy with the process of fitting an equation of multiple regression. The 
type of problem involved is also closely related to problems earlier discussed, on the one 
hand, by P. C. Mahalanobis (1927, 1930, 1936) and on the other by H. Hotelling (1931). 
It may, therefore, be of some value to show the connexion between these three different 
lines of work, and to distinguish between the objects for which they were developed. 

If we have samples of and N 2 objects respectively, and make p measurements ajj, ... Xj, 
on each, the analogy between the calculations of a discriminant function (written now with 
upper affioes) X = + 6**, + ... + b»x,, 

which shall best distinguish objects of one class from those of another, and the procedure of 
multiple regression, is brought out by introducing a formal dependent variate y, which is 
given the value A 2 /(A^i + .^ 2 ) ^^r objects of the first, and - NJiNy + N^) for objects of the 
second class. 

These conventional values ensure that the average values of y in the two classes shall 
differ by unity, and that 


where the summation is taken over all the objects observed. 

The multiple regression equation for predicting the value of y from observed values 
Xu,..tX„ is now of the form p 

y= 

where the regression coefficients, 6',...,6 p, are given by the equations 


/-I 



R. A. FISHER 


83.377 

377 


where 

Xi stands for any of the variates Xi,...,Xp and for the difference between the mean of Xi 
in the first sample and that in the second. 

To demonstrate that we may take the coefficients b obtained from these regression ^ 
equations as the coefficients of the discriminant function, note that 


where stands for the mean products of the variates x^ and x^ taken within the two samples, 
and n for the degrees of freedom within samples. 

Substituting this expression for in the regression equations, they take the form 


whence n £ St^b> = 

showing that the coefficients obtained differ only by the constant factor 

A*{l-S(6d)} 


from the solution of the equations n = d^ 

i-i 

obtained (Fisher, 1936, p. 181) for the coefficients of the discriminant function. 


II. The analysis of variance 

By fitting the regression equation the variation observed in the variate y has been analysed 
in two portions. The sum of the products of the regression coefficients, b, and the right-hand 
side of the regression equation, A*d, is 

A2 2(6d), 

and this is the portion accounted for by regression, out of the total A*. CJonsequently we 
have the analysis 

Degrees of freedom Sum of squares 
p A*£(W) 

Ni + N^-p-l 


Ni+N,-1 A* 


If B stand for the multiple correlation of y with x^, .,.,Xj„ evidently 

= i:(6d). 

This same quantity is the difference between the mean values of X in the two samples 
observed. 

The table of the analysis of variance suggests, though by itself it does not demonstrate, 
that the significance of could be tested by applying the ordinary z test to the analysis. 
Ordinarily, in multiple regression the population postulated has a normal distribution for y 
for each set of values ... Xj,. The distribution of the independent variates is then irrelevant. 
The population postulated in our present problem has fixed values of y, but a simultaneous 
normal distribution for Xi, ....Xp. Hotelling’s earlier work shows, however, that the test of 
significance is exactly that which the analysis of variance suggests. 



S3.378 

378 STATISTICAL UTILIZATION OF MULTIPLE MEASUREMENTS 


III. Hotbllino’s test of significance 

The title of Hotelling’s paper (1931) shows that he was not concerned'with estimates, 
but with a test of significance. The “Generalization of.‘Student’s’ ratio’’ at which he 
arrived is derived from the matrix 8^^ of dispersion within samples. In connexion with 
this he uses the vector of differences with a factor reducing it to the precision of a single 
observation; in our notation ^ _ xd. 

If 8*^ stand for the element corresponding with in the reciprocal matrix, then Hotelling 
chooses a form invariant for all linear transformations of ...^Xp, and puts 

i-u-l 

where n (= iVi + iVg — 2) is the number of degrees of freedom within samples. 

Now, from the equation 

it follows that w-6‘ - A*( 1 — R‘^) ^ 

i-i 

Multiplying by d^ and adding, these eciuations give 

S6d = (1-/2*) T^jn, 
or T^^nR^I{\-R% 

If wo calculate the z test of significance from the analysis of variance, we find 

Substituting for R in terms of T, and for in terms of n, this is 

I T'^\ 

* =2>%'—]—fn + i*‘’K|*+ „ ') 

= J’log T\n- 2 )-\- \ )-\\ognj}, 

with degrees of freedom Wj =s p, = n-p + 1, in accordance with the tost of significance 
given by Hotelling (1931, p. 377). 

IV. Mahalanobis’ generalized distance 

The test appropriate for the significance of the discriminant function, that is for significant 
contradiction of the hypothesis that the samples are from populations undifferentiated in 
I’espect of the variates x^io x^, was thus given by Hotelling so early as 1931. Naturally, 
the scalars T and R give no indication of the direction in p-space in which the two samples 



R. A. FISHER 


33.379 

379 


are most distinct; they do, however, indirectly measure the distance, or the extent to which 
the two sets of multiple measurements differ. This is the object of a third series of researches 
initiated by Mahalanobis in 1927. 

If is a typical element of the dispersion matrix of the populations sampled, and is 
the corresponding element of the reciprocal matrix, the property of the population of which 
Mahalanobis proposes an estimate is 

I p p 

where 8 stands for the difference in each variate between the |)opulation means. 

This resembles Hotelling’s test of significance in being invariant for all linear trans¬ 
formations of the variates x\ evidently also it is only zero if 8^ vanishes for all values of i. 
It differs from Hotelling’s form in being a population parameter capable of estimation. 
The factor 1/p is a convention due to the fact that Mahalanobis, like Hotelling, was led 
to investigate the subject through recognizing the Bhf)rtcornings of the various forms of‘ 
coefficients of racial likeness which had been use<l by Pearson and his followers. 

The practical estimate of takes two forms appropriate to the cases in which the dis¬ 
persion matrix is taken as known (as are the variances in one form of Pearson’s (Coefficient 
of Racial Likeness), and in which it is estimated from the two samples. The first or “un- 
studentized ” form was investigated by R. C. Bose (1930). He used 


I p p 


1 


1 

AT,- 


I’he sampling distribution found by Bose is equivalent to the limiting distribution for 
the multiple correlation coefficient to which I have called attention (Fisher, 1928). If we 
consider the distribution of a variate B dependent on a population value, /3, in such a way 
that the frequency element of the distribution is 




I P 2 p(p + 2)2.4. ) “ ' 

then the distribution of the multiple correlation coefficient, H, calculated from a large 
sample from a [)opulation having true correlation />, will be found by substituting 


the distribution being exact when n is increased indefinitely, and p is the number of in¬ 
dependent variates. 

The distribution of the unstudentized statistic is found equally by making the sub¬ 
stitutions ^2 _ A^pA^, 


52 _ A2pZ>2 4 - 2 >, 

where A* stands for -f N 2 ). 

The B distribution is also, as was shown in 1928, closely linked with a double Poisson 
series. A table of the 5 % points is here reproduced from my 1928 pa[)er. 

For the case, of greater practical importance, in which the dispersion matrix within 



33.380 

380 STATISTICAL UTILIZATION OF MULTIPLE MEASUREMENTS 


samples is not known in advance, but is replaced by obtained by pooling the sums of 
squares and products from the two samples, is defined by the equation 


in which the allowance for bias (1/A^) included in the unstudentized form has been dropped. 
Such adjustments are, of course, unnecessary when the correct sampling distribution is 
available. This point deserves emphasis, since some statisticians, unfamiliar with the use 
of exact distributions, still seem to regard the discussion of bias as relevant to problems of 
estimation. 

In a very brilliant research R. C. Bose & S. N. Roy have demonstrated that the distri¬ 
bution of X)*, so defined, takes a form derivable from distribution (C) of my 1928 paper, of 
which the frequency element is 

J(p-2)!J(«-p-l)!" “ l.j, 2 + 1.2.p(p+l)\ 2 ; +•••/’ 

which reduces to distribution (B) when n-> qo , and nR* -> B®, but for finite n differs from that 
distribution in replacing the Bessel function by a confluent hypergeometric function. 


Table of 5% points of the, distribution of B* 


Values 

Value of nj 

of/3 

1 

* 

3 

4 

6 

6 

7 

00 

1-9600 

2-4477 

2-7966 

3-0802 

3-3272 

3-5485 

3-7606 

0*2 

1-9985 

2-4720 

2-8140 

3-0956 

3-3405 

3-5602 

3-7613 

0-4 

2-1070 

2-5419 

2-8680 

3-1405 

3-3796 

3-5961 

3-7930 

0-6 

2-2654 

2-6497 

2-9533 

3-2126 

3-4426 

3-6617 

3-8446 

0-8 

2-4505 

2-7865 

3-0640 

3-3076 

3-5268 

3-7278 

3-9144 

10 

2-6461 

2-9398 

3-1941 

3-4216 

3-6291 

3-8210 

4-0005 

1-2 

2-8461 

3-1069 

3-3386 

3-5606 

3-7462 

3-9289 

4-1008 

1'4 

3-0449 

3-2796 

3-4936 

3-6911 

3-8766 

4-0491 

4-2134 

1*6 

3-2449 

3-4684 

3-6561 

3-8408 

4-0148 

4-1796 

4-3363 

1-8 

3-4449 

3-6410 

3-8246 

3-9978 

4-1620 

4-3184 

4-4681 

2-0 

3-6449 

3-8263 

3-9976 

4-1604 

4-3168 

4-4646 

4-6074 

2-2 

3-8449 

4-0137 

4-1743 

4-3278 

4-4760 

4-6166 

4-7631 

2-4 

4-0449 

4-2027 

43639 

4-4990 

4-6388 

4-7738 

4-9043 

2-6 

4-2449 

4-3932 

4-5369 

4-6736 

4-8066 

4-9363 

5-0603 

2-8 

4-4449 

4-5847 

4-7199 

4-8606 

4-9774 

6-1006 

6-2204 

3-0 

4-6449 

4-7772 

4-9065 

6-0301 

6-1612 

5-2691 

5-3840 

3-2 

4-8449 

4-9706 

6-0926 

6-2115 

6-3273 

6-4404 

6-5508 

3-4 

5-0449 

5-1644 

6-2809 

6-3946 

5-6056 

5-6142 

5-7204 

3-6 

5-2449 

6-3589 

6-4703 

6-6792 

6-6867 

5-7901 

6-8924 

3-8 

6-4449 

5-4914 

6-6606 

6-7660 

5-8676 

5-9679 

6-0665 

4-0 

6-6449 

6-7493 

6-8516 

6-9521 

6-0606 

6-1475 

6-2426 

4-2 

5-8449 

6-9451 

6-0434 

6-1401 

6-2361 

6-3286 

1 6-4204 

4-4 

6-0449 

6-1412 

6-2369 

6-3290 

6-4206 

6-6109 

6-6998 

4-6 

6-2449 

6-3376 

6-4288 

6-6187 

6-6072 

6-6945 

i 6-7806 

4-8 

6-4449 

6-5342 

6-6223 

6-7091 

6-7947 

6-8792 

1 6-9626 

6-0 

6-6449 

6-7311 

6-8162 

6-9002 

6-9831 

7-0649 

7-1467 


* I am indebted to the Council of the Royal Society for permission to reproduce this table, which appeared 
in Proe. Boy. 8oc. A, 121, 665. 




R. A. FISHER 


33.381 

381 


The translation of the solution of Bose & Roy into distribution C is now merely 

X^pJP = nmi{\-R^)= T^, 
and, as in the cose of Hotelling’s formula, 

iV^i + ^ 2-2 = n. 

For large samples it will therefore be usually sufficient to calculate T, or perhaps 
T/-^(l + T*/w), and to enter the (^, B) table with this value for B\ the value of ^ for which 
a significant value is just attained will give a fiducially limiting value for AX^p. 

This procedure brings into relief the desirability of two further extensions of the tables 
available; (a) my table gives only the upper r> % B for given values of j8, the lower values 
will now also be required; (6) it would be most valuable in addition to have tables of distri¬ 
bution C, in the form of for given or some other form which will tend to the available 
limiting form given by the B table. A few suitably chosen values of n, such as 24, 12, 8, 6, 
would doubtless suffice to show over what parts of the field the limiting distribution is of 
sufficient accuracy. 

V. Extension of discriminant analysis 

We have seen that the Calcutta School have elucidated the notion of generalized distance 
in fields of multiple variates, and have advanced their researches to a point at which only 
a moderate extension of existing tables is needed to apply exact tests of significance to this 
measure. Work on the discriminant functions is not so far advanced. Using the same 
geometrical analogy, the discriminant function is a unit vector specifying the direction of 
one population from another. It is true that when two populations are indistinguishable in 
respect of the measurements available, no significant estimate can be made either of the 
distance or of the direction. Hotelling’s insight thus led him to the appropriate basic test 
of significance for both problems. When, however. Hotelling’s test of significance is satisfied, 
the relevant problems which suggest themselves diverge. Measuring distance, we naturally 
will ask whether one observed distance significantly exceeds another. Measuring direction, 
we shall likewise be led to test whether three or more populations are collinear, or coplanar. 
The relevance, or even urgency, of such questions in all fields in which populations are 
discriminated by multiple measurements is obvious. 

Let us suppose we have 8 populations designated by tt- = 1 ,..., s represented by samples of 
individuals. The means of the p variates in sample tt will be i = l,...,p. 

Any component of the set of possible comparisons among samjdes may be defined by a set 
of values A" such that « iw* 

we may speak of different comparisons as orthogonal if they are specified by A and p satis¬ 
fying the condition X”fi" 



33.382 

382 STATISTICAL UTILIZATION OF MULTIPLE MEASUREMENTS 


If • 

J: N^ = N, 

we may choose as our first component 

= N^I^N. 

Then k will not represent a comparison among populations, hut by virtue of the ortho¬ 
gonal property its inclusion ensures that 

S A’' = 0, 

jr««l 

SO that all components after the first will, properly speaking, be comparisons among popu¬ 
lations. These comparisons are analogous to the second set of variates discussed 

by Hotelling (1936). Hotelling, however, is considering a set of normal variates, whereas 
which may be regarded as a variate, varying from sample to sample, need make 
no approach to a normal distribution. 

For any such comparison wc may now put 

jr-I 

and obtain the corresf)onding discriminant function 

X ^ ^ 

i=»i 


by using the relations 


6 *’ = £ a^idj = £ £ 
y-l i«lir=l 


XiA’-r.. 


7 > _ 

where ^ 

In the analysis above, has its former meaning as an element of the information matrix 
within samples; it is seen that is independent of the comparison chosen. A set of (s — 1) 
functionally independent comf)arisons thus gives a set of (« — I) vccitors defining the general 
discriminant space apf)ropriate to the 8 samples. 

If now r 

£ ^iX^n ~ ^TTX* 

i-1 

the sum of squares among populations for any chosen comparison is 

X = S S 

i-l i«u-l 

= i iA-Axi 

a 8 





R. A. F1SHE& 388 

Now the sot of values A'/^Af", including k, may be regarded as defining a mutually orthogonal 
sets of direction cosines; hence 

+ X^Xx + ii^fiX +... = 0 . 

Adding now the expressions for the sums of squares among |) 0 [)ulation 8 , the total lor a 
complete orthogonal set of comparisons is found to be 




but the component k gives 


X i k^kXI V N”Nxt^ 


so that the remaining 5—1 comparisons contain together 

i S”!,,- 'is 

ff=i iy 1,-^1 

It follows that no component can be chosen so that 

i 

jr=lX‘l 

exceeds this amount, 

]m p llLgietu of i’wodmn, m tl l il ie 11 n l' i i ig" ! ^ 
maximfe^the sum of squares of A" among samples, dediust this amount Irom the tot^al^md 
test whethSl^ie remaining ( 5 - 2)p degrees of freedom contain a larger sum of squaj^than 
the variationS^iin samples will account for. Likewise co[)lanarity will b^/^sted by 
deducting the larg^Jif^iair of mutually orthogonal (;omponents. 


VI. SkjnW'ant i)IKFj5Rkn(’ks in niHKpinoN 

If this procwliirc is applied to thrfea^mples. two M^Jogonal comparisons are found, 
containing rcsjKictively the maximal anJNjJje mii^sums of squares among samples. 
These may bo conveniently found from the sytKjg^OTC functions 

e, + #2 = NHn + + 

(^1^2 " "* ^23 ~ 2fii/.23 + i 

+ ^33^11 “ ^31 ~ 2f22l3i + 2 <i2^2 
+ tiit22 ~ ^12 “ ^^33^12 ‘h ^^ 13 ^ 23 }* 


and 




38.384 

384 STATISTICAL UTILIZATION OF MULTIPLE MEASUREMENTS 


The question whether the contrast in the available measurements between two giv^ 
samples differs in direction from that supplied by a hypothetical discriminant functiony 

r be solved by considering the limiting case in which a third sample is introd^i^^, very 
dis^^it from the other two in the required direction, i.e. if 

= S 

i-i 

we let \ X 3 < = 08<, 

where U shalPbe increased without limit. The question whether theyrfiscriminant function 
observed for th^wo given populations differs significantly from 3 yfiay be recognized to be 
the same as the question whether the three populations depart sigj^cantly from collinearity. 
For resolving thi^uestion, we have 

p p 
^13 — 2 2 

i-i ^-1 ^ 

\ 

Similarly, 
and finally 


^88=^’* 2 2i3»^; 


since the absolute magnitude of tli^oef! 
Q* by adopting the convention th 


dent jS is so far arbitrary, we may equate to 


2 2 1 . 

AsQ is increased, the la^r root of the quaclrati^S^j^, also increases indefinitely. Signific¬ 
ance of departure fromy^llinearity is tested by the s^ialler root. The limiting value of this 
smaller root may be c^^eniently found from the ratio^ 2/(^1 + ^a)* numerator and the 
denominator of thisy&ch contains a portion proportionaiip 


Thus, 




N^(N-N») 
NQi ^ 

N\N-N^) 

~ 'N " ’ 


and 




ViMiwa 

^ NLl^ "" ^ ~ ~ ^^13^23 + ^23)} 

viv*iV* ( p p p 



E. A. FISHER 


33.385 

38 fi 


Hence, 


mwa { p p \ 


wheWL 843 in the previous work with two samples, d stands for the diiference betw^^their 
means na any measurement. 

ReplacW N-N^hy N^ + we observe that the number .^3 has disappeaj^, as irrele¬ 
vant to the^alysis; the factor outside the bracket is now of our previoi^sections. The 
first term is r^gnizable as T*, and the whole may be written 

which simply repla^ in Hotelling’s test of significance. 

In the form of the w!Baly8i.s of variance we have 


tegroea of freedom 

P-1 

^n-p+l 


Sum of squares 

'r^-A‘(S,-s,)y 


The significance of the deviation of the observed diyfiminant function from any pro¬ 
posed function of the same kind i^hus easily tested. Tj/te test may, however, be thrown int»o 
another form of great simplicity. 

If 6',..., are the coefficients of th^iscriminayTfunction obtained from the two samples 

by the regression method, wc have soeiwiat 

fi X ^ 


Hence 


>1. S X 


Also 




Hence if r is the correiyfon coefficient within samples be^een the discriminant function 
proposed, and that obynned, 


= U 
A2 1 - 


W 


-(^1-^2: 


Now, w^ave already shown that 


so^e expression fo:- the smaller root may be written simply as 

T\\-r% 



33.385a 

Each comparison has p degrees of freedom, so that in testing for collinearity 
we may maximize the sum of squares of X among samples, deduct this amount 
from the total, and test whether the remaining degrees of freedom contain a 
larger sum of siiuarcs than the variation within samples will account for. Like¬ 
wise coplanarity will lie tested by deducting the largest pair of mutually orthog¬ 
onal components. 

In choosing the component making the largest contribution it will be observed 
that 8—2 coeflicients have been adjusted; consequently the number of degrees 
of freedom to be ascribed to the first component is p -f*« 2, leaving (s — 2) 

(p — 1) for the remainder. Similarly, if the next largest component be separated, 
it will contain p -f « — 4 and leave (s — 3)(p — 2). It will be noticed that the 

sum of the arithmetic series (p -f- « — 2) 4- (p -f 8 — 4) H-h (p — 8 + 2) 

adds to the total of p(8 — 1) degrees of freedom for the (8 — 1) components and 
represents the partition of degrees df freedom among them if they are chosen in 
order of magnitude. 


VI. Significant differences in direction 

The question whether the contrast in the available measuremenfit between two 
given samples differs significantly in direction from that supplied by a hypotheti¬ 
cal discriminant function 

X' - 

may be most simply resolved by eliminating the variate X', or by using the par¬ 
tial variation only for all variates when X' is fixed. 

Thus, in the regression problem of Section I, the sum of the products with y 
becomes 

5(X'i/) - 

where D is the difference between the mean values of X' in the two samples. 

The sums of squares of X' within samples may be simply expressed in terms of 
the correlation within samples between X' and X, the discriminant function 
obtained from the observations; for the sum of squares within samples of X is 

\^R\\ - 

and the sum of the products XX' is 

X 2/>(1 - R*) 

whence, if r is the correlation of X with X', it follows that the sum of the squares 
of X' is 


X*/>*(1 - R2)/RV: 



88.385b 


whence adding \^D that for all observations must be 


- «*(! - r*) I//2V*. 

We may see that thus the elimination of X* reduces the sum of squares for y from 
X* to 


X* ~ (x2Z»* i2VVx*D*(l - ft* 4- r*ft*) 




ft*r* 


ft* + r*ft*; 


X*(l - ft*) 

1 - ft* -f r*ft* ’ 


while the portion expressible in terms of the variates x is reduced from X*ft* to 


X*ft*(l 


_L'_^ 

1. - ft* -f ftV*/ 


X*ft*(l - ft*)(l - r*) 
1 - ft* + r*ft* 


The ratio of the part to a whole has thus been changed from ft* to ft*(l — r*). 
If when so reduced the multiple correlation is no longer significant, then the 
hypothetical discriminant function X' is not contradicted by the data. The 
whole class of discriminant functions contradicted by the data at any chosen 
level of significanct; is thus specified simply by the correlation coefficient within 
samples between the function proposed, and that calculated from the data them¬ 
selves. 

Example. Four measurements on the flowers of fifty plants each from the 
species Iria versicolor and I. selosa (Fisher, 1936, p. 180) give a ratio of the sums 
of squares 

ft* » -963416, 


for 4 against 95 degrees of freedom.* The 1% *(3,96) is -6926, and the corre¬ 
sponding variance ratio is 3 • 995. Hero ft* is 


3 X 3-995 
95 -f 3 X 3 -995 


-112025. 


The ratio of ft* required for significance at the 1% level, to the value of ft* ob¬ 
served, is 

1 - r* - -116279 


where 


r = -94006 


given the minimal value of the correlation within populations, such that any 
discriminant function proposed having a lower correlation than this may be re¬ 
jected at the 1% level. This is a convenient and direct measure of the precision 
of the discriminant function as estimated. 



3S.386 

386 '^ATISTICAL UTILIZATION OF MULTIPLE MEASUREMENTS^ 

We reject any proposed discriminant fimction, if its correlation r j 

samples with th^hayt discriminant obtainable is so low that the varianoe rati^ 

T\l-r*) n-p+1 
p -1 * n 

is significant for rij =* p — 1 , + 1 ; jT* being Hotelling’s^[ 0 H^!falization of Student’s 

ratio. 

Example. Four measurements on the nhi^ere of fifb^^lants each from the species Iria 
versicolor and /. setosa {Fisher, 1936, p. ISSnJi^^ ratio of the sums of squares 26*336, 
for 4 against 96 degrees of freedom. z is 0*6926, and the corresponding 

variance ratio is 3*996. The ratio i^foffl^ums of squareN^herefore 

; = * 1261 . 

This is only of that observed. Consequently, at a 1 % levSSqf significance we 

should rejejj^^onnulae having a correlation with that obtained less thai^((^95709, or 
0*9976^^1 his is a convenient and direct measure of the precision of the discHiiiinant 

linn 11° 


VII. Summary 


The result of three independent lines of research on the treatment of multiple measure¬ 
ments are set out in a consistent notation. The method is extended to the examination 
of oollinearity and coplanarity of samples, and to testing the significance of deviations in 
direction. 


REFERENCES 

M. M. Barnard (1935). ^*The secular variations of skull characters in four .series of Efi^yptian skulls.*' Ann. 

Eugen., Land., 6 , 362-71. 

R. C. Bosk (1936). “On the exact distribution of D* statistic.” Sankhya, 2, 143-54. 

R. C. Bo.SK & 8. N. Roy (1938). “The exact distribution of the Studentizod statistic.” Sankhya, 3, pt. 4. 
H. Fairfield Smith (1936). “A discriminant function for plant selection.” Ann. Eugen., Land., 1, 240-60. 
R. A. Fisher (1928). “The general sampling distribution of the multiple correlation coefficient.” Proc. Roy. 
Soc. A, 121, 654-73. 

-(1936). “The use of multiple measurements in taxonomic problems.” i4nn. Etigen., Land., 7, 179-88. 

H. Hotelling (1931). “The generalization of 'Student’s* ratio.” Ann. Math. Statuit. 2, 360-78. 

-(1936). Relation between two sets of variates. Biometrika 22,'^21-11. 

P. C. Maualanobis (1927). “Analysis of race mixture in Bengal.” ./. Asiat. Soc. Beng. 23, 301-33. 

-(1930). “On tests and measures of group divergence. Part 1. Theoretical formulae.” J. Asiat. Soc. Beng. 

26, 541-88. 

-(1936). “On the generalized distance in statistics.” Proc. Nat. Inst. Set. Ind. 12, 49-66. 

N. WaUjACK a R. M. W. TKAVEK.S (1938). “A jisychometric sociological study of a group of speciality salesmen.** 

Ann. Eugen., Lord., 8 , 266-302. 



34.421a 


34 

THE PRECISION OF DISCRIMINANT 
FUNCTIONS 


See Author's Note, Paper 33. 


Reprinted from Annals of Eugenics, Vol. X, Pt. IV, pp. 422—429, 1940. 



34.422 


THE PRECISION OF DISCRIMINANT FUNCTIONS 
By R. a. FISHER 
1. Introductory 

In a paper (1938a) on “The statistical utilization of multiple measurements” the author 
considered the general procedure of the establishment of discriminant functions, or sets of 
scores, based on an analysis of covariance, for a battery of different experimental deter¬ 
minations. In general, these functions are those giving stationary values to the ratio of 
apportionment of sums of squares between chosen and Wg residual degrees of freedom. 
In the simplest application is unity, and, as was first shown by Hotelling, the primary 
test of significance as to whether the set of measurements available are effective in making 
any significant discrimination, is exactly reducible to a simple analysis of variance in which 
p — 1 degrees of freedom (for p variates) are transferred from the residual. In the general 
case the underlying problem of distribution has also now been solved (Fisher, 1939). 

In both the simpler and in more general cases the question of the precision of the scores so 
ascertained is of immediate importance. Obviously, this presents certain peculiar features. 
If all coefficients of a discriminant function are increased or decreased in proportion, the 
function is effectively unchanged. No standard error can therefore be assigned to such a 
coefficient considered singly. Partial standard errors, on the other hand, in which all other 
coefficients are given fixed values, will certainly exist, although it is not at first sight obvious 
how they should be calculated. 

In the case of the coefficients of a multiple regression equation, the author has often felt 
that the total standard errors ordinarily calculated were somewhat artificial, and certainly 
they are frequently misinterpreted. Thus, in the prediction of capacity to resist high altitude 
from data on individuals obtainable at sea level, it appeared in a recent study (Fisher, 1938 6), 
that when seven sea-level characteristics were employed in the prediction, not one of the 
coefficients was significant, although an apparently good prediction was obtained from the 
multiple regression formula. All that the non-significance meant, however, was that if any 
one of the coefficients were given the value zero and the other coefficiente readjusted, the pre¬ 
diction formula was not significantly impaired. The sea-level characteristics showed, in 
fact, sufficiently close mutual correlation for any one of them to be capable of replacement 
by an appropriate linear function of the others, so as to compensate nearly completely for 
its absence from the prediction formula. Actually a prediction based on only four sea-level 
values was preferable to one based on all seven. Similar situations often arise in economics. 

It is clear from this example that all questions relevant to the precision of the coefficients 
of a multiple regression formula may be expressed comprehensively in terms of a rule or 
test of significance as to whether any alternative formula proposed is significantly contra¬ 
dicted by the data. For multiple regression such, a test is immediately available by multi- 



423 


84.423 


R. A. FISHER 

plying the rows and columns of the c-matrix by the deviations between the coefficients 
arbitrarily chosen and those evaluated empirically. 

In the paper referred to (1938 a), I applied this concept of a generalized test of significance 
applicable to any function arbitrarily proposed, and showed that the sum of squares corre¬ 
sponding with Hotelling’s T® became T®( 1 - r®), where r is the correlation between the dis¬ 
criminant function proposed and that indicated by the data, within the samples which it is 
proposed to discriminate. This I carelessly interpreted to mean that Hotelling’s T* was 
simply reduced to T®(l- r*), forgetting that, in Hotelling’s notation, T® also appears in 
the total sum of squares, which is unaffected. The numerical example, p. 386, is therefore 
incorrect, and the inconsistency of my formula has been pointed out by Bartlett (1939). 
The form in which Bartlett expresses the true relationship is, however, that my formula 
is correct if for the correlation within samples is substituted the correlation obtained when 
both samples are thrown together. This is true, but confusing, for while the correlation within 
homogeneous groups is an appropriate and natural method for measuring the similarity of 
two linear functions of the observations, the correlation when heterogeneous material is 
thrown together is of no intrinsic interest, its application being limited to the particular pair 
of samples under test. 

The following section gives a simple demonstration of the correct formula from two 
complementary standpoints, with a view to exhibiting how the two correlations in question 
are related in any particular batch of data. 

2. The test of sioniftcance of a proposed discriminant 

In testing the significance of a discriminant function built of a number of different variates 
Xi,...Xp the analysis of variance appears in two different guises. We may consider the 
analysis of variance of a dummy variate y distinguishing the two contrasted samples, 
dividing the portion expressible in terms of a;,,... as independent variates, from a residue 
not so expressible. Alternately, regarding our discriminant itself as a variate, we may 
analyse its variation between and within the samples. 

Thus, if, for samples of and N^, we take 

y = NJ{N,-^N,) 

for objects of the first sample, and 

for objects of the second sample, then the expected value of y for given values of ... Xp 

will be y = X = 

in which the coefficients b are given by the equations 

= S{x^y) = A®d<, 

where A® = ~ )> 

and is the difference between the means of the samples for variate i. 



34.424 


DISCRIMINANT FUNCTIONS 424 

The analysis of variance for y is then 



Degrees of 
freedom 

Sum of squares 

RegresHion 

P 

S{ K*) = A^Sibd) = A«/i> 

Remainder 


S{y- Y)» = A«{i -r(6d)) = A*(i -Ra) 

Total 


S{y*) = A* 


On the other hand, considering X as a variate, we have 

and if Aj, are the means of X in the two samples, 

X\X,~X^)^ S{Xy)^X^R\ 
So that for the analysis of X we have 


Between sample.s 
Within samples 

Sum of squares 


Total 



an analysis equivalent, apart from a constant factor to the first. 

Consider now any proposed form ^ _ x/Pxi 

for the true discriminant of the population from which our sample is drawn. We shall be 
interested in its correlation with X within sami>les, denoted by r, rather than the total 
correlation r' when both samples are thrown together. If, however, 

= A'\ 

it follows that R{iX) - AXRr\ 

Since, moreover, Y is the multiple regression prediction formula, 

whence S{^y) = A\Rr\ 

From this it follows that — = AARr', 

and that the sum of squares between samples 

Hence the analysis for ^ may be completed, with the corresponding values for covariance 
with Xy as follows: 



Degrees of 
freedom 

Sum of squares 
(^•) 

Sum of products 
(XS) 

Between samples 
Within samples 


A*Rh-'* 

A\i--R*r'*) 

AAR»r' 

AARr'U-R*) 

Total 


A* 

AARr' 





34.425 


426 R. A. FISHER 

The correlation coefiB.cient r within samples is therefore given by 

I - /ev'2 ’ 


Thus, the class of formulae specified by a fixed value of the correlation within samples has 
also a fixed value for the correlation when the samples are thrown together. Whereas, 
however, for any chosen formula, r is an intrinsic property of homogeneous populations, 
both R and r' will depend on the relative, and absolute, sizes of the samples. 

If now ^ were used to predict y, wo have for the analysis of y 



Degrees of 
freedom 

Prediction 

Remainder 

N, + N^-2 

Total N^ + N^-i 1 


Sum of squares 


= A»i^*rV{ i - R*{ i - r*)} 

A*( I - = A*( 1 - R*)l{ I - /e>( I - r«)} 


A* 


A* 


and comparing this with prediction based on all p independent variates, we have, for testing 
the significance of the contribution of the others, after ^ has been taken account of. 



Degrees of 
freedom 

8mn of squares 

Additional 

p- 1 

A*R*(i -i<») (I -r»)/{i - «*(i -r»)} 

information | 


Remainder 

N^ + N^-p-i 


Total 

~N~+N,-2 \ 

A*(i-«*)/{!- I -r*)} 


A similar analysis is found for X, if wo eliminate covariance with 

The modification of Hotelling’s test needed when we wish to examine whether the dis¬ 
criminant indicated by the data differs significantly from any proposed form consists then 
in (i) reducing the number of degrees of freedom by unity, and (ii) substituting 

R'^ = R^\-r^) 

for R^ as the ratio of the part to the whole in the sums of squares. 

We should then reject any proposed discriminant formula, if its correlation r within 
samples with the best discriminant function obtainable is so low that 

p-l. l-R''^ 

n-p+l i2*(l-r*) n-p + l T\l-r^) 

is significant for 




p-l l-R^{l-r^) 

Wi=p—1, nj^ — n—p-\-l. 

The corrected rule gives a much more reasonable basis for rejection. Thus the discri min a n t 
on four flower measurements for Iris versicolor and /. setosa (Fisher, 1936, p. 184) gives 

R^ = 0*963416 







34.426 


DISCRIMINANT FUNCTIONS 426 

for 4 against 96 degrees of freedom. For « 3, n, = 96, the 6 % value of « is 0*4968, the 
variance ratio is 2*7015. Multiplying by 3/96, the ratio of sums of squares is 

hence ^ 0*078606. 

Dividing by R it appecuB that the limiting value of r is given by 

= 0*081691 

or f “ 0*96834. 

The precision with which the coefficients of the discriminant function have been deter¬ 
mined is thus sufficient to reject at the 5 % level of signihcance any formula having a 
correlation with that found less than 0*96834, within the species. In this way we have a 
comprehensive and appropriate measure of the precision with which the discriminant 
function has been determined by the data. 

3. Discriminant functions based on non-linear equations 
The method of approach used in the present paper, in which the precision of the coefficients 
of a discriminant function is discussed through a test of significance of deviations from the 
hypothesis that the function has some other assigned form, brings clearly to view the 
complications that arise when more than a single degree of freedom is maximized. 

For example, in a contingency table individuals are cross classified in two categories, 
such as eye colour and hair colour, as in the following example (Tocher’s data for Caithness 
compiled by K. Maung of the Galton Laboratory). 


Eye colour 

Hair colour 

Fair 

Red 

I 

% 

Medium 

Dark 

Black 

Total 

Blue 

Light 

Mraium 

Dark 


241 

584 

909 

403 

no 

188 

412 

68 x 

3 

4 

26 

85 

718 

1580 

1774 

1315 

Total 

H55 

286 

2137 

1391 

118 

5387 


Variation among the four eye colours may be regarded as due to variations in three 
variates defined conveniently in some such way as the following: 


Eye colour 



a?. 

Blue 

0 

0 

0 

Light 

X 

0 

0 

Mraium 

0 

I 

0 

Dark 

0 

0 

I 


We may then ask for what eye colour scores, i.e. for what linear function of 07^, a;,, x,, 
are the five hair colour classes most distinct. The answer may be found in a variety of ways. 
For example, by starting with arbitrarily chosen scores for eye colour, determining from 
these average scores for hair colour, and using these latter to find new scores for eye colour. 




34.427 


^27 R. A. FISHER 

Apart from a contraction of scale by a factor for each completed cycle, this form tends to 

a limit, and yields scores such as the following: 


Eye colour 

X 

Hair colour 

y 

Light 

Blue 

Medium 

Dark 

-0-9873 
— 0-8968 

0-0753 

* 5743 

Fair 

Red 

Medium 

Dark 

Black 

— 1-2187 
-0-5226 
-0-0941 
1*3189 

2-4518 


The particular values given above have been standardized so as to have mean values 
zero, and mean square deviations unity. In the sample from which they are derived each 
score has a linear regression on the other, the regression coefficient being 0-44627; this is, 
of course, equal to the correlation coefficient between the two scores regarded as variates. 
Hotelling has called pairs of functions of this kind canonical components. It may be noticed 
that no assumption is introduced as to the order of the classes of each category. In Tocher’s 
schedule Light eyes come between Blue and Medium, but the discriminant function puts 
Blue between Medium and Light, though near the latter. 

The precision of the scores assigned to different eye colours must be judged by the con¬ 
formity of the data to various possible hy^wtheses concerning these scores. For example, we 
might test the hypothesis that the hair colour scores are correct, but that the apparent 
difference in score between Light and Blue eyes is illusory, their true scores being the 
same. The blue-eyed and the light-eyed children may here be compared directly, using the 
variate y : 

Blue-eyed 



Frequency 

Score 

y 

Jy 

fy' 

Fair 

326 

-1-2187 

-397-30 

484-2 

Red 

38 

-0-5226 

- 19*86 

10*4 

Medium 

241 

— 0-0941 

— 22-68 

2-1 

Dark 

110 

1*3189 

145 08 

191-3 

Black 

3 

2-4518 

736 

18-0 


718 


— 287-40 

706-0 

115-0 




y-0 40028 Sf(y-y)* 591-0 


Light-eyed 



Frequency 

Score 

fy 

!y' 

Fair 

688 

—1-2187 

-838-47 

1021-8 

Red 

116 

— 0-5226 

— 60-61 

31*7 

Medium 

584 

-0-0941 

- 54 95 

5*2 

Dark 

188 

1-3189 

247*95 

3270 

Black 

4 

2-4518 

9-81 

24-0 


1580 


— 696-27 

1409-7 





306-8 




y-0-44068 5/(y-^)* 1102-9 











34.428 

DISCRIMINANT FUNCTIONS 428 

The sura of squares for error is 1693*9 for 2296 degrees of freedom, giving a mean square 
0*73776: dividing this by 718 and 1680 we have 0*001028 and 0*000467, so that the variance 
of the difference between the scores is 0*001495 and the standard error 0*03867. The actual 
difference 0*04040 is therefore not significant. 

In general, if we wish to compare the observed scores, derived from the data, with any 
proposed values we may test the linearity of the regression of y on 

Thus, if ^ takes the values 0, 0, 1, 2 in the four classes for eye colour, we have 

S{g) = 4404 I = 0*81762 

= 7034 = 3433*63 

S{yi) = RS{xi) = 1907*83 

8{y^) = 6387*00 

4326*95 

Now the sum of squares for y within arrays is 6387( I — K^) ~ 4313*67. So the analysis can 
be set out as follows: 



n.F. 

s.s. 

Mean square 

Deviations from linear regression 


13*28 

6*64 

Within arrays 

5383 

43»3*67 

0*80135 

Total 

5385 

4326*95 



Thus the data show a decidedly significant de])arture from linearity. So that, if the scores 
for hair colour, y, be accepted, the data contradict significantly any set of values for ^ for 
which not only are Light and Blue eyes given equal scores, but Medium eyes are placed 
exactly half-way between those and Dark. 

The consistency of these two methods may be illustrated by finding the contributions to 
the analysis above of two separate components. Of these one is the discrepancy between the 
means for Blue- and Light-eyed children, while the second is found by taking the means of 
Blue and Light together, adding the mean for the Dark-eyed, and comparing the sum with 
twice the mean of the Medium-eyed. 

For the first comparison, we have the difference between Blue- and Light-eyed children, 
0*04040; dividing the square of this by the sum of the reciprocals of 718 and 1580, we have 
0*806 as the contribution of this component to the sum of squares. 

For the second component we have 



Number 

S{y) 

Mean 

Reciprocal 

Blue and Light 

2298 

- 983*73 

— 0*42808 

0*00043516 

Medium 

1774 

59*63 

0*03361 

0*00056370 

Dark 

I3*S 

924*10 

0*70274 

0*00076046 

Discrepanoy 


0*00 

0*20744 

0*00345042 









42S 


34.429 


R. A. FISHER 


The divisor now is 




dividing the square of 0-207 44 by this, we have 12-471 as the contribution of the second 
chosen component. The two components together give 13-2S, checking with the value 
obtained lor deviations Ironi linear regression in the analysis of v'^ariance. The two dis¬ 
crepancies may thus he tested separately in successimi. The significance of the two degrees 
of freedom is clearly due only to the secoml comj>onent. 

It might seem that the problem which we have discussed ff)r simple discriminant analysis 
was not analogous to that examined above, but to the wider cpiestion whether the data 
are compatible with the c^ioseji values ^ ibr .r, together with any set of scores for hair colour. 
In considering this problem, however, we. must remember that theie are tliree [mirs of 
canonical components with (‘oricspondiiig correlations. If for the remaining two of them 
the correlation is insignificant, the corresponding components are presumably arbitrary, so 
that no significant deviation is to Ik- <‘\pect(‘<l fnun any ^ arbitrarily assigned. The practical 
question must involve the furtluM- stipulatioii that the correlation corresponding to our 
cho.sen component shall be the largest of the three possible values. Such a problem is not 
likely to have any easy solution. 


RKFKKKNCKS 

R. A. Fihhur (19.46). ‘ Tho use of mul(i[»lo laonsuriMui'nt.s in t.-ixonoaiio prohlnins.” Avn. Kugm., Land., 7, 
179-88. 

— (l!Kt8(7.). “Tho statistical utilization of inulfipli* nic.i.snn-mcntH.” Ann. Engen., 8, 376-86. 

-(H)38 b). “ On tho stat istical i roat mont of f lu- rolat ion In-twoon .-loa-lovoi chara(*tori.stit's and liigh-altitudo 

acclimatization.” Proc. Jiot/. t^or. A, 126, 2i> 9. 

(1939). “Tho sampling distrihiition of some .stali.Htics obiairiod from non-linear equations. ” Ann. Eugen., 
Lon,l., 9, 238-49. 

M. S. Bartlktt (1939), “Tho standard i-rrons of discriminant function coidlicionts.” ./. Roy. Statist. Soc. 
Suppl. 6, 169-73. 



3S.173a 


35 

THE COMPARISON OF SAMPLES WITH 
POSSIBLY UNEQUAL VARIANCES 


AUTHOR’S NOTE 

In republishing this paper I should like to invite the reader’s atten-> 
tion to the first section, in which the logic of the test is discussed. 
The principles brought to light seem to the author essential to the 
theory of tests of significance in general, and to have been most un¬ 
warrantably ignored in at least one pretentious work on Testing 
statistical hypotheses.” * Practical experimenters have not been 
seriously influenced by this work, but in mathematical departments, 
at a time when these were beginning to appreciate the part they 
might play as guides in the theoretical aspects of exp>erimentation, 
its influence has been somewhat retrograde. 

With respect to the particular problem, first discussed by Behrens,* 
who arrived, I believe, essentially at the right solution, the origin of 
the controversy may be distinctly recognised. Pearson and Neyman 
have laid it down axiomatically that the level of significance of a 
test must be equated to the frequency of a wrong decision “in re¬ 
peated samples from the same population.” This idea was foreign 
to the development of tests of significance given by the author in 
1925,* for the experimenter’s exi>erience does not consist in repeated 
samples from the same population, although in simple cases the nu¬ 
merical values are often the same; and it was, I believe, this coinci¬ 
dence of values in simple cases which misled Pearson and Neyman, 
who were not very familiar with the ideas of “Student” and the 
author. It was obvious from the first, and particularly emphasised by 
the present writer, that Behrens’ test rejects a smaller proportion of 
such repeated samples than the proportion specified by the level of 
significance, for the sufficient reason that the variance ratio of the 
populations sampled was unknown. 

This point was early emphasised (Fisher, 1937) * by giving in a 
simple case the exact formula for the proportion rejected; it is ir¬ 
relevant to the purpose of the test, for the experimenter is not con- 


Reprinted from Annals of BugenteM^ Vol. IX, Pt. II, pp. 174-180, 1089. 



36.173b 

cemed with repeated samples from the same population. The popu¬ 
lation of cases which concern him is specified by the properties of 
his sample, not by the functions of an entirely hypothetical popu¬ 
lation. The objection raised against Behrens’ test thus seemed 
merely irrelevant. Later S. Wilks ® has stated that he has proved 
that no test can exist in this problem, satisfying the conditions laid 
down by Neyman and Pearson. This, one might have thought, would 
have settled the matter. It is obviously not an objection to a test 
of significance that it does not satisfy conditions which cannot pos¬ 
sibly be satisfied! However, as the point seems still to be disputed 
on these grounds, ignoring Wilks’ note, the opening section of this 
paper may still serve a useful purpose. Perhaps, also, Professor 
Wilks may be induced to publish the proof of his statement, and so 
clarify the nature of the “requirement” which Neyman and Pearson 
have, apparently inadvertently, introduced. 

REFERENCES 

' J. Neyman and E. S. Pearson, 1932. ‘^Testing statistical hypotheses in rela¬ 
tion to probabilities a priori.^* Proc. Catnb. Phil. Soc.^ vol. 29, pp. 492-616. 

• W. V. Behrens, 1929. “Ein Beitrag sur Fehlerberechnung bei weinige Beo- 
bachtungen.’’ Landw, Jb.^ vol. 68, pp. 807-837. 

• R. A. Fisher, 1926-46. Statiatical Methods for Research Workers, Edinburgh, 
Oliver <Se Boyd. 

^ R. A. Fisher, 1937. “On a point raised by M. S. Bartlett on fiducial prob¬ 
ability.*' Ann. Eugen,, vol. 7, pp. 370—375. 

• S. S. Wilks, 1940. “On the problem of two samples from normal populations 
with unequal variances.’* Ann. Math. SUU., vol. 11, pp. 475—476 (abstract). 



d5.174 


THE COMPARISON OF SAMPLES WITH POSSIBLY 
UNEQUAL VARIANCES 

By R. a. fisher 
1. The nature of the problem 

For many years, prior to the introduction of exact tests of significance, it was customary, 
when a number of mean values had been obtained, as in a replicated experiment, each based 
on two or more independent observations, to calculate independently a standard error for 
each mean, and thence to obtain a different standard error for each possible comparison 
to be made. 

This procedure, besides being laborious, is o|)en to the objection, in many cases, that the 
observed estimates of standard errors, ascribed to different treatments, or varieties, do not 
differ more than would be expected merely from errors of random sampling. When this is 
the case, it is reasonable to conclude that the greater part of the observed differences is in 
fact due to random sampling, and that a more precise, as well as a simpler, analysis would 
be possible by pooling the sums of squares of deviations obtained from different varieties, 
and using the pooled estimate for all the tests of significance required. 

This change (Fisher, 1926-38), which made it possible to makebxact tests of significance, 
had the advantage of giving precision to the null hypothesis, which the tests were required 
to substantiate, or to discredit. For the null hypothesis is now simply that all treatments 
or varieties, or those of them chosen for comparison, are equivalent in the circumstances 
of the test, and in respect of the measurements used. Consequently, the pooling of the 
estimates of en’or is now habitual in all experimental trials. 

Critics concerned to uphold the older biometrical tradition, misunderstanding the nature 
of the hypothesis to be tested, argued that such tests were invalid on the ground that the 
variances of the different varieties were assumed to be equal. The equality of the variances is, 
however, a characteristic of the null hypothesis chosen. This hypothesis is never assumed 
to be true, and the whole point of the procedure is to give the facts an opportunity of demon¬ 
strating its falsity if, in fact, it is not true. It is an hypothesis particularly appropriate to 
experimental trials, in that, if a treatment has any effect on the variances of the observed 
values, it must in some circumstances increase them, and in others diminish them; so that 
any hypothesis involving a difference in variances is only of interest when it is already 
admitted that the treatment has any relevant effect at all. 

The advances of statistical science have consisted largely in the provision of exact tests 
of significance appropriate to an increasing variety of useful hypotheses, and occasionally, 
though not characteristically in experimental work, some interest attaches to hypotheses 
implying that the means of two populations are equal, while their variances are unequal. 



86.175 

R. A. FISHER 175 

At least a theoretical problem of this sort can be framed. The solution has been known for 
nearly ten years (Behrens, 1929), though it has been obscured by some controversy (Bartlett, 
1936), arising, I believe, from a misunderstanding of the nature of the problem. Useful 
tables of the solution have recently been published (Sukhatme, 1938), and it is the purpose 
of the present note to clarify the hypothesis of which they furnish the exact test. 

In putting forward his test of significance “Student” (1908) specifies that the problem 
with which he is concerned is that of a unique sample. His clear intention in this is to exclude 
from his discussion all possible suppositions as to the “true” distribution of the variances 
of the populations which might have been sampled. If such a distribution were supposed 
known, “Student’s” method would be open to criticism and to correction. In following his 
example it is not necessary to deny the existence of knowledge based on previous experience, 
which might modify his result. It is sufticient that we shall deliberately choose to examine 
the evidence of the sample on its owm merits only. This has not only the advantages of giving 
simplicity and definition to the problem, it has the profoundly important effect that modern 
tests of significance, treating each body of data as unique, can thereby derive from them 
independent evidence which may be compared, knowing it to be independent, with evidence 
from other sources. In applying this junnciple, there is, of course, nothing to prevent us from 
combining the evidence of several different samples. We can do so and at the same time treat 
the whole body of available material as a unique body of data. Without methods of treating 
unique samples, we should have no real guhlance in these more complex cases. 

This principle is important for our problem, because it might be thought, that in testing 
the significance of the difference betw^een two means of normal samples, when the hypo¬ 
thetical equality of the variances of the populations from whi(;h they are drawn is deleted, 
it is to be replaced by some supposition, based on previous experience, as to the true ratio of 
the variances, or as to the distribution of this true ratio. On the contrary, when any such 
previous experience, sufficiently valid to demand inclusion, exists, I suggest that it should be 
treated in exactly the same way as the evidence supplied by a unicpie [)air of samples. In 
this way it will, of course, add to our information and, in consequence, allow’ of the rejection 
of the hypothesis that the means are ecjual, in cases in which such a rejection w ould otherwise 
be inadmissible, but its possible existence does not supply any reason for neglecting the 
problem of a pair of samples regarded as unique. 

For the case in which the variances are by hypothesis equal, any difference between the 
estimated variances is evidence only of sampling error. The element of our hypothesis by 
which the equality of the variances is replac.ed, is that the observed ratio between the 
variances is no evidence that this ratio is in error in one direction, rather than the other. 
We suppose, indeed, that it will be affected by sampling error, but the increment or decre¬ 
ment in the logarithm of the estimate due to errors of random sampling will be supposed, in 
the material to which the test is applied, to be distributed exactly as such errors are known 
to be distributed in general, for estimates based on the same numbers of degrees of freedom, 
i.e. in the z distribution. 



36.176 

176 SAMPLES WITH POSSIBLY UNEQUAL VARIANCES 

The implication of this supposition is that whereas, supposing the variances of and cr{ 
equal, the estimate derived from one sample is equally relevant to the estimation of 
as to that of erf, now, when the variances are no longer supposed equal, we specify for exacti> 
tude that the value sf is of no relevance for the estimation of u|, nor is of relevance for the 
estimate of af. In this precise sense the unknown variances of and of may be spoken of as 
'^independent by hypothesis". Such variances may, of course, be near to equality, or may 
differ to any possible extent. 

In contrasting this hypothesis with that of equality, it is worth noting that, just as the 
latter is appropriate when the variances of the populations sampled are imt exactly equal, 
but differ by an amount small compared with the errors of sampling, so tSsI hypothesis of 
independence impUes that real differences are to be expected which are large compared 
with the sampling errors. Evidently, in the same material, we may be more interested to 
test the hypothesis of equal variances when the samples are small, and the hypothesis of 
independent variances when the samples are large. Equally, the investigator will be free, 
without incurring the charge of inconsistency, to test the same body of data from these two 
contrasted standpoints. 


2. Analytic propebtibs op the solution 
m+i 

Let S (a;-*)*, 

1 

iit+i 

nj(ni+l)s|* 8 
1 

A statistician who also knew the true variance ratio of the populations would know the 
true relative weights of the means x, x'; let these be as 1: 


Then 


(»i+l)(nisf + wn,sj) 


would be the sum of homogeneous squares, from which, by dividing by + n, and 
1, the sampling variance of 2 can be estimated. Hence the sampling variance of the 
differences-S'is 1 / 1\ 

If any limit, S - S' = d^(«f+sj), 


where d depends on and n,, and also on the ratio were proposed, such a statistician 
could calculate . v / • . ,x 




and from this value, and the number of degrees of freedom, + n,, could read the probability 
that a pair of samples, from populations having the same mean, should give a difference 
between the observed means greater than the limit proposed. 



R. A. FISHER 


86.177 

177 


The inolueion of the sum of squares, «} + «}, in the formula above is not arbitrary, but 
merely conventional, since d is supposed to vary when the ratio Si: is changed. Any limit 
of the kind proposed could therefore be put into the form chosen. 

The probability obtained by this process clearly involves w, and cannot be ascertained 
so long as is unknown. We may, however, suppose that in the material to which the test 
is to be applied iv takes different values in accordance with a known law. The average value 
of the probability will then be the probability, on repeated trials with varying values of tu, 
that a statistician, knowing for each trial the true relative weight but ignorant of the 
absolute variability, would find the limit proposed to be exceeded by chance, by the means 
of samples from populations having in fact the same mean. 

If Vi and Vg are the population variances, then 


K±1K. 

(ni+l)v2’ 


and, whatever the values of and v, may be, if 


z =ilog~'. 


then z will be distributed in its familiar distribution with and n, degrees of freedom. 
Hence fiducially w may be taken to be distributed as is 


For example, consider the case = Tig = 6, Sj * 8^, for which according to Sukhatme's 
table, based on Behrens’ formula, d = 2*435. 

As typical of the variation of w we may take the medians of sixteen ranges of equal 
frequency, for which P is an odd number over 32; as the case is symmetrical, only 8 values, 
from 1/32 to 15/32, need be tabulated (Table I). The second column then gives the fraction 
of the total weight contributed by the less weighty sample, either 

w I 1 €*• 

u;+l“e**-fl’ «;+i”e*»+r 


Table I. Frequencie8 with which the tabtdated valv£8 of d are exceeded 
for varioue po88ible value8 of the true relative weight 


p 

w 

t<;+ I 

t 

d 

t 

0 / 

/o 

1/32 

0*15906 

0 0 

1*7811 

5*02 

3/32 

0*24050 

2*0814 

2*98 

s/32 

0*29479 

0*91190 

2*2205 

2*33 

7/32 

0*33927 

0*94692 

2*3058 


9/32 

0*37865 

0*97010 

2*3622 

11/32 

0*41505 

0*98546 

2*3996 

1*68 

13/32 


0*99492 

2*4226 

i*6i 

IS/32 

0*48332 

0*99944 

2*4336 

1-59 

19*00 





36.178 

178 SAMPLES WITH POSSIBLY UNEQUAL VARIANCES 

Knowing w, we can calculate ^ * 

the values of t are obtained by using Sukhatme’s value of d, and those for the i)eroentage 
falling outside the fiducial limits, from “Student’s” (1926) table in Metron. The average 
of the eight values is 2*38 %. Seeing that a finer graduation would doubtless have increased 
the contribution of the tails, the agreement with 2‘6() % is entirely satisfactory. 


3. Equivalence with previous solution. 

The analytic equivalence of the two approaches is most easily perceived by means of the 
analysis of variance. 

Wo have to consider the independent variation of t for Wj 4*»2 degrees of freedom, and of 
z for Til and degrees of freedom, and in this double distribution to calculate the total 
probability that 

(^5 + sle^j (n^ + ‘ 


Now if A, fi and C are the sums of squares respectively of 1, and homogeneous degrees 
of freedom. 


+ _ ,2 
B+C ~ 


for Hi *f nj degrees of freedom, while 


^2^ = g2i 


for Hi and degrees of freedom. Consequently, the inequality defining the region of 
significance may bo written in terms of A, B and C as 


or, more simply, as 


A' B ^ C ‘ 


( 2 ) 


Using trilinear coordinates A, B, C the boundary is a conic through the vertices of the 
triangle of reference. 

We obtain the same analysis of variance, and therefore the same simultaneous distribution 
of Ay B and C by putting 

B = l)«f/vi, 

C == ?i2(«2+ 1)«IK, 






where and have respectively and degrees of freedom, and are distributed 
independently. 




R. A. FISHER 


85.179 

179 


Substituting these values for A, B and C in equation (2), we find 

which is the inequality given by Fisher (1935) and used by Sukhatme. 

The values calculated in Table I show that the reason why d is as high as it is, is that it 
makes allowance for the possibility that the relative weight of the two means compared 
differs materially from what the samples indicate. In that example, however, the apparent 
weights are equal, and to obtain a clear understanding of the test it is worth while to con¬ 
sider a case in which we can distinguish between the effects of two different possibilities, 
which the experimenter will certainly wish to consider: {a) that the true variances are unequal, 
and (6) that their ratio differs from that of the estimates derived from the samples. 

If, in the general formula, 

^2 . 

(1 -}- i /w) (ni«i + M-’Wgj?!) ’ 

we consider first the supposition that the variances arc equal, we shall put 


w = 


» 2 + 1 
+ i ’ 


so that 


%-f»2 ’ (W-l+1)(W2+ 1) W + 


Note that d* is not in general equal to in this case, though it is so when = ng. Thus, if 
rij = 6, nj = 8, the 5 % value of t for 14 degrees of freedom is 2* 145. 'faking af/tsl equal succes¬ 
sively to 3, 1 and 1/3, wo find 

d = 2 033, 2-109, 2-320. 


If, in the second case, it wore supposed that the real relative precision of the two means 
were exactly equal to the apparent relative precision, we should have 

w = s\fs\, 

whence = d*. 


If, on the contrary, we allow for the possibility that the apparent relative precision 
differs from the true precision by sampling errors given by the z distribution, the appro¬ 
priate values of d are those given in Sukhatme’s table, with which we may compare the 
values obtained above. 

It will be seen from this table that the possibility that the true ratio of weights differs 
materially from what it appears to be, is the major factor in requiring a larger value of d 


Table II. Values of d appropriate to different suppositions, = 6, n.^ = 8 


8 \la\ 

3 

I 

1/3 

Equal variances 

2033 

2109 

2-320 

Estiinated variances 

2 * 145 

2145 

2-145 

Independent variances 

2398 

2364 

2-332 






8S.180 

180 SAMPLES WITH POSSIBLY UNEQUAL VARIANCES 

when the samples are small. To take into consideration only the possibility that the true 
variance ratio is equal to that observed is quite insufficient. When the smaller apparent 
variance is associated with the smaller number of degrees of freedom, this test may actually 
diminish the value of d below that obtained for equal variances. The problem of a test of 
significance for samples with possibly unequal variances has, however, often been conceived 
as though in this case the only danger to be considered was that the true variances should 
differ from equality as much as appeared from the estimates. The danger that, owing to 
random sampling, the estimated ratio should be in error, has not apparently been appreciated. 

This may explain why Bartlett (1936) should have thought it could be inferred (for the 
case n-i « n,) that the probability of exceeding d must always be greater when than 

when Si or a, » 0. He says: (p. 666) 

An examination of Behrens’ complete table = n^) might be sufficient to make us suspect its 
validity, for in all cases the hducial probability given is less for Sj/s, — 1 than s^fa^ — 0 or oo, 
whereas given T, we should expect to be more sure that the observed difference is significant if 
tfj/Sg =» 1, since in that case there is evidence that cf+a} is more efficiently estimated. 

Sukhatme's work has now shown that at the 6 % level the facts are the reverse of Bartlett’s 
statement when n > 6. It is probable, however, in any case that Bartlett would not now be 
inclined to press an argument of this sort, for the errors of sf + a\ regarded as an estimate of 
o} + 0^1 fail to specify the errors of It would be impossible, without entering exactly 
into the analysis, to make inferences as to the relative values of d appropriate to different 
observed ratios. There seems to be no justification for Bartlett’s procedure of taking the 
value of d when or is zero as an upper limit for other cases. At the time of writing, 
however, Bartlett wae evidently under the impression that an analytic error of some kind 
underlay Behrens’ formula, and this perhaps made him expect to find some unreasonable 
feature in the table. 

There can be now no-doubt that the supposed error was non-existent. Behrens proposed 
and gave the correct solution of a |)erfectly definite problem. Opinions may differ as to the 
occasions in practical research to which this problem is appropriate, but a discussion of this 
topic cannot be furthered by the suggestion that the numerical results to which his solution 
leads are inaccurate. It is probable that, at the time he wrote, Bartlett imagined that he 
had found a better approach to the same problem, but, as has already been shown (Fisher, 
1937), the test of significance on which he relied is irrelevant to the work he was discussing. 

REFERENCES 

W.-V. Behrens ( 1929 ). “ Ein Beitrag zur Fehlerberechnung bei weinigo Beobachtungen.” Landw. Jb. SS, 807 - 37 . 
M, S. Bartlett ( 1936 ). “The information available in small samples.” Proc. Cavnb. Phil. Soc. 32 , 560 - 6 . 

P. V. SuKHATME ( 1938 ). “ On Fisher and Behrens’ test of 8igni6canop for the difference in means of two normal 
samples.” / 9 aniUij/d, 4 , 39 H 18 . 

“Student” ( 1908 ). “The probable error of a mean.” Biometnka, ^, 1 - 26 . 

- ( 1926 ). “New tables for testing the simifioanoo of observations.” Metrm, 6 , 18 - 21 . 

R. A, Fisher (1926-38). Statiitical Methods for Research Workers. Edinburgh: Oliver and Boyd. 

-( 1936 ). “The fiduoial ap^ument in statistical inference.” Ann. Evgen., Land., 6 , 391 - 8 . 

-( 1937 ). “On a point raised by M. S. Bartlett on fiduoial probability.” Ann, Eugen., Land., 7 , 370 -^. 



se.237» 


3<5 

THE SAMPLING DISTRIBUTION OF SOME 
STATISTICS OBTAINED FROM NON¬ 
LINEAR EQUATIONS* 


AUTHOR’S NOTE 

For nearly two decades prior to the date of this publication, the 
arithmetical procedure of the analysis of variance had been found 
in a rapidly expanding field of applications, to provide the most 
commodious approach to the problem of summarising thoroughly, 
and interpreting in critical fashion, the kinds of observational data 
available. In many cases the properties naturally postulated for the 
observations in question were such as to render the interpretation, 
and the standard tests of significance, mathematically exact; but 
this was not always so. New extensions were constantly being made, 
such, for example, as that implied by the discriminant functions, and 
it seemed to the author worth while, as in the opening section of 
this paper, to specify explicitly the conditions for the exactitude of 
the 2 -test; and in the succeeding sections to illustrate cases in which 
it must be inexact, though often presumably a good approximation, 
the limitations of which could not be specified without the solution 
of problems so far seemingly intractable. 

The paper incorporates the solution of the simultaneous distribu¬ 
tion of the latent roots which arise in discriminant analysis, without 
the formidable notation of matrix algebra. The method of resolu¬ 
tion of this rather difficult problem may therefore be of interest in 
view of other possible applications. 


Reprinted from Anmala of Eugtnico, Vol. IX, Pt. Ill, pp. 238 - 249 , 1939 . 



36.238 


THE SAMPLING DISTRIBUTION OF SOME STATISTICS 
OBTAINED FROM NON-LINEAR EQUATIONS 

By R. a. FISHER 
1. The field op the 2-tbst 

It has long been recognized (Fisher, 1936) that that aspect of the analysis of variance which 
consists in comparing the mean square ascribed to some possible causes, or discrepancies, 
with an appropriate residual mean square, or error, is absolutely valid for normally dis¬ 
tributed errors, subject to a certain limitation of the form in which the adjustable parameters 
are involved. 

For example, to take a case of wide generality, in testing the goodness of fit of a regression 
Une, or surface, having a given hypothetical form, no difficulty is introduced merely by 
reason of the line being curved, or the form non-linear. The idea that special difficulty is 
involved in non-linear regression is an illusion widely disseminated by the Pearsonian 
school, who were indeed completely at fault, even in testing the goodness of fit of linear 
regression. 

If the form of the regression line be 

Y — JC j » 

where Jf j,..., Xp are any functions of the Independent variate a;, we may minimize 

S{y-Y)^ 

for variations of ... ,/?p, and obtain linear equations for these adjustable parameters, 

F) = 0 

having solutions fti = bp. 

Multiplying by 6^, ...,6p, and adding, it follows that, for the solution, 

S{yY) « 6f(y»), 

and therefore that S{y - F)* = S{y^) - 8{ F*). 

This is the residual sum of squares with w - p degrees of freedom, where n is the number of 
degrees of freedom possessed by the set of values y, which may themselves be deviations 
from some simpler formula; while S{ F*) hasp degrees of freedom. Then in general the analysis 
of variance 



Degrees of 
freedom 

Sum of squares 

Regression 

P 

S{Y*) 

Residual 

n—p 

S{y-Y)^ 

Total 

n 

W) 







86.239 


239 R. A. FISHER 

supplies a direct and exact test of significance for the set of adjustments represented by 
to 6p. 

For goodness of fit, where we have several observations y in each array, the most com¬ 
plicated form of curve is one which has as many variable constants as there are arrays. 
All such curves will pass through the array means, and all are indistinguishable, so far as 
such data are concerned. The goodness of fit of any curve involving a—p restrictions on 
the array means may therefore be tested in a similar analysis: 



Dogreoa of 
freedom 

a — p 

n — a 

Sum of squares 

Deviation from fitted lino 
Within arrays 

«(»-!?.)• 

Total 

n—p 



It will be noticed that the expected values from which deviations are calculated in the 
second line are the array means, so that the vanishing of this sum of 
squares implies for all observations, y = y®* V ^ co-ordinate 

in generalized space, these conditions are satisfied in a plane continuum 
of a dimensions, within which lies a more restricted plane continuum 
of p dimensions representing all possible fitted curves of the form chosen. 

The observation point O will not in general lie within either space. The 
foot A of the perpendicular dropped from it on the a-space determines 
the observed means of the arrays, its length, OA, determines the sum of 
squares within arrays. The foot P of the perpendicular dropped on the p-space determines 
the fitted (!urve; its projection AP in the a-space gives the sum of squares of deviations. 
The test of goodness of fit consists in recognizing that, if this is unduly large compared with 
OA, the hyi)othetical form is incompatible with the data. 

What is essential for the generality of the analysis is clearly not the linearity of the 
regression line, but the linearity of the relations among observation points for different 
sots of deviations to vanish simultaneously. Thus, if expectations are expressed in terms 
of parameters in any way, we may eliminate the parameters and obtain the relations which 
hold among the expectations of direct observations, equally liable to error. It is these 
which must be linear for the simple procedure of analysis of variance itself to supply exact 
tests of significance. 

It is, of course, to be anticipated that the ordinary tests will still be sufficiently exact 
when the radii of curvature of the 8[)aces concerned arc all large compared with the distances. 
This condition has been recognized in the use of least square formulae for non-linear 
equations in astronomy and geodesy; the real importance of non-linearity does not lie, 
however, as has been suj)posed, in the fact that with equations non-linear in the parameters 
employed, the process of fitting may have to be iterated. This in itself presents no difficulty; 





86.240 


STATISTICS FROM NON-LINEAR EQUATIONS 240 

what is important is that after a good fit is obtained, and the sums of squares to be compared 
have been accurately evaluated, curvatures of the restriction spaces, if they are not small 
relative to the distances indicated by the sums of squares, may appreciably affect the 
frequency distribution of their ratios. 

2 Thb test of significance of a harmonic component 
Examples, in which the analysis is not too difficult, are rare; an early one was provided 
by the process of testing the reality of harmonic components in a series of equally spaced 
observations, 

Supposing such a series to be composed of independent and normally distributed values, 
then any function 

8{a,u,) 

will have a mean value zero, if S{a) = 0, and will be normally distributed with the same 
variance as a single observation, if 

iS(a*) ss 1. 

If the number in the series is odd, 2^ + 1, then 


a - H ^ 1 

1 27rpr 

' VU+ij 



lain-^ 

■■ vU+ij 



for values of p from 1 to s, will constitute 2s mutually orthogonal components, occurring 
in pairs having periods (2s + l)/p units. The like is true of an even number, 2s + 2, only in 
this case there is an unpaired component 

of period 2 units. We may omit this component from consideration by deducting 

2^ (tti - +u, -... - 

from S{u—u)\ 

so that the remainder consists of two components for each of s different periods. 

The smn of two squares for each period constitutes a certain fraction of the whole. If 
we choose that period out of s available which makes the largest contribution, it has been 
shown (Fisher, 1929) that the fraction, g, taken by this period is distributed so that the 
probability of exceeding any value g is 

+ .( 1 , 

in which h is the greatest integer less than Ijg. 




96.241 


R. A. FISHER 

Now, the choice of one period out of the 8 Fourier submultiples available, together with 
the evaluation of the two corresponding coefficients, gives that function 


U 


. 2npr „ . 
" cos . ; ■ +.PSm 
2«-t-1 


27rpr 

28 + 1 ’ 


involving the adjustable constants A, B and p, which minimizes 

S{u-U)^ 

for integral values of p. If p were adjusted to some intermediate value it would presumably 
give a lower value still. 

To find the mean of g as distributed in random samples, it is convenient to use the 
expression for I ~ P, 

. 

for values of A; up to 8 for which kg >1. 

Then the mean of g may be evaluated as 

in which the limits for term k are 1/A; and 1 
The contribution of each term is therefore 


which is to be summed for values of k from 2 to 8. 

Now, if is a polynomial in A; of degree less than 8, 

putting Mfc = !)•-(-!)•}, 


the mean is found to be 




This may be expressed as the definite integral 

1 n 


i f‘{1 + (1 -a:) + (l -*)•+... + (1 
8 Jo 





36.242 


STATISTICS FROM NON-LINEAR EQUATIONS 242 

. 

or i{f(s)_/( 0 )}. 


The values of g, of 3/2«, and of 2s^/3 are given below for the first few values of a. 
Table I. Values of g for armll aamplea 


Length 
of series 

a 

9 

3/2« 

2^/3 

5 

2 

3 l 4 

3/4 

1 

7 

3 

ii/i8 

ll 2 

1-2222 

9 

4 ! 

25/48 

3/8 

1*3889 

11 

S 1 

137/300 

3/10 

1*5222 

*3 

6 

49/120 

1/4 

1*6333 

*5 

7 _1 

363/980 

3/14 

1-7286 


Instead of accounting for only the fraction 3/2« of the total sum o squares, as would be 
the case for a formula linear in three adjustable parameters, the fraction accounted for 
in large samples is about 

y + logs 
a 

the ratio of which to 3/2s increases indefinitely as the size of the sample is increased; if 
a = 12 it is double, and at 5 = 227 about four times expectation. 

Note that if p were fixed, and only A and B adjustable, the eliminant would be 

ft 277® 

a linear equation between any three adjacent observations; whereas, whenp also is adjustable, 
we have 

Uf 1 + + Wr+2 

for any four adjacent observations; and this eliminant, being non-linear, shows that the 
formula cannot be replaced by one linear in the three constants required. 


3. The simultaneous distribution of latent roots 

The most important case, for practical statistics, of an analysis derived from the solution 
of non-linear equations, is one which has been approached from different standpoints by 
several different groups of writers, notably Hotelling (1933, 1935, 1936), Mahalanobis 
(1930, 1936), Bose (1936) and Bose & Roy (1938), and which I have recently discussed 
under the heading of “The statistical utilization of multiple measurements*’ (Fisher, 1938). 









86.243 


243 B. A. FISHER 

The problem that emerges, when the effect of these researches is view^ed in its mathematical 
generality may be stated as follows: 

If a?!,..., iCf,..., ajp stand for p different variates and for the sum of products of any two of 
them, an analysis of covariance will give values separately for several dillerent dis¬ 
tinguishable causes. 

In connexion with any series a^j corresponding with degrees of freedom we shall wish 
to consider the totals (of + degrees of freedom) including the contributions of 
error, or of other causes, from which our chosen series may or may not be significantly 
distinguishable. 

If, using arbitrary multipliers ft*, positive or negative, we make a comy)ound variate 

X = 

i 

then the sums of squares for X are 



D.F. 

s.s. 

Treatment 

n, 


Remainder 


2.L b*b’{A(j — n^j) 

Total 

[ ni + w. 



and it is to be noted that the ratio of the sums of squares is stationary for all variations 

V ^ 

Hence 0 is one of the roots of the equation 

.(4) 

which is at most of degree p. For the present we may suppose that and both exceed p—\. 

Note that 0 is also the ratio of the part to the whole of the sums of squares of X, so that 
all the roots of the equation lie in the range from 0 to 1. The fundamental problem is the 
sampling distribution of these roots considered simultaneously. 

The simultaneous distribution of the sums of squares and products, of a couple of variates 
X and y, was given by the author (Fisher, 1015) in a form equivalent to 

X .(•'>) 

in which the frequency element of the parent population is 

477 - 


Clearly X, the size of the bivariate sample, may in general be replaced by «jl 4-1, when 
is the number of degrees of freedom on which the estimates are based. 






30.244 


STATISTICS FROM NON-LINEAR EQUATIONS 244 


If now a second sample yields values we shall have a similar distribution in 

which these values are substituted for and for n^. The frequency with which 

the two bivariate samples give values in the range 


is given by the product of two such expressions. 

With homogeneous samples we may argue in like manner from the totals, obtaining 
the total frequency of samples for which 

+ + ®aa + ®aa ”-^aa* 

have given values. Noting also that, when Un, a^, a*, are fixed, 

it follows that, given Ai 2 , A^a, the simultaneous distribution of a^, and Uaa is 


df = {iK + ^a-2)}'{iK->-^a~3)}! 

^ {4(«. - 2)M {iK - 3)}! {iK - 2)}! {i(n, - 3)}! 


.(«) 


-- 

This is the case of two variates of the more general distribution for p variates 

jf _{4(”i + ” «- 2)}!-{4(n, + n , -p-l))i _ 

■' {i(n, - 2)} 1p-1)}!{J(»,- 2)}!...{i(w, -p- 1 )}!'^n 

^ I .(') 

This very general distribution is derived by reasoning, exactly parallel with that above, 
from the distribution of a single set of variances and covariances 


’ U(Sr2)}! .7. {r(n-^- Ij}^ I “ I*'”-''-"*.. *»»n. 

where I a | and | a | stand for the determinants of the a’s and a’s respectively. This is the 
generalization for p variates of the bivariate distribution given above. It may be derived 
either by the geometrical argument used by Wishart (1928), or as Hotelling has shown 
(1933, p. 51a), by an inductive chain, using the fact that the distribution of a partial corre¬ 
lation from which i variates have been eliminated, is exactly the same as that of a total 
correlation based on a sample smaller by i, and that such partial correlations must be 
distributed independently of the variates eliminated. 

The distribution with which we are concerned is invariant for all linear transformations 
of the variates, so that we may at this point simplify the algebra by choosing 


All 

An — 0 , 


an « a, 

M 

li 

an 

1 

H 

1 

It 

m 







86.245 


2*8 R. A. FISHER 

The roots of the equation d\ then satisfy the symmetric relations 

6+6' - a+c ~ p, 

66' - ac—b^ = q, 

whence = (i-a)(l-c)-6* * l-p+q, 

(a~c)* — j}*—4g-46*, 


and 


3(a, c) 


The simultaneous distribution of p, q and 6 is therefore 

(ni + na-2)! 


d/=. 


where the factor 2 has been inserted on the understanding that the integration is taken 
over the region in which a exceeds c. 

For given p and q, b may take any value between the limits ± Integrating 

between these limits, the last factor then is replaced by the constant, n. 

To obtain the simultaneous distribution of the roots, note that 


Ub 


.( 8 ) 




giving 


-■ 4 > .<?*<"■■*( 1 -1 - -e')d0d0'. 

4 ( 71 j — 2 )! {n^ — 2 )! 


.(9) 


For p variates, when p does not exceed n, or n,, the general distribution of the p roots 
is, as might, apart from a factor involving p only, at this stage be expected, 

•f _ {i(ni + n,- 2)}! ■ ■. {i(rei + n,-p- 1)}^^_ 

^ " {IK - 2)}!... (iK -f> -1)} 1 (iK - 2)}!. •. {\(<-p -1)}! 

X -^i) - (1 

X (d,-d,)... (d,-dp)(d.-(?,)... ...<w,. .(10) 

The general form, apart from the constant factor, may be demonstrated by using the 
transformation. = Se, 

which, if S= 1. 2= 0* when j^k 

i i 

satisfies the requirement that the sum of the principal minors of | a | of degree 8 is equal 
to the sum of the products of 6 taken s at a time. For example, 

S = S ^jfc> 




and so on. 






86.246 


STATISTICS FROM NON-LINEAR EQUATIONS 246 

We may now replace the \p{ji +1) variates a^j by p variates 6 and \p{p -1) functionally 
independent values e. The Jacobian of this transformation is of degree \p{p-\) in 6, and 
must consist, apart from a constant involving p, of the product of the lp{p - 1) differences, 
for it can be shown to contain such differences as as a factor. 

If as variates we choose those ey for which j>t, then to satisfy the condition 

0 ^ 


we find 


^ti _ 

0Ci2 Cji 

^2 _ 

^'11 


whence 


0ey 

0ei, 


i > 2, 



fill 


{e,-o,y 


For all values of i and j, this contains the factor {0^-0^), which, therefore, divides the 
Jacobian. Similarly, this is true of every difference (6>/-<?„,), thus establishing the form of 
the distribution. I'he constant factor in the Jacobian cannot involve or so that, for 
any value of p, putting * Wg « 1, we may evaluate it by direct integration. An 

elementary evaluation of this function of p k given in the following section. 

A more formal demonstration of this important distribution is exhibited in this number 
by Dr P. L. Hsu. 


4. Some special oases 

A common source of such an analystis of variance as was considered in § 3 is the exj^ression 
of p dependent in terms of independent variates. This situation has been most elegantly 
elucidated by Hotelling. The linear compound corresponding to the largest root has been 
termed by him the “most predictable criterion”. The compounds corresponding with the 
whole series of roots he has termed the canonical series of dependent variates, while the 
multiple regression formulae for these form the corresponding covariant series of dependent 
variates. The roots 0 are then the squares of the correlation coefficients between different 
pairs. The same roots are thus obtained if the sets of variates are interchanged. Thus, if 
p exceeds there are oidy roots, and and p are interchanged in the distribution 
formula. If either set of variates is normaUy distributed the distribution of the roots, for 
independence of the sets of variates, or for homogeneity in respect of the second set of 
samples selected in respect of the first set, is unaffected by non-normality of the other set. 
This explains the curious equivalence of the discriminant function for a single contrast 
with the partial regression formula for an artificial variate introduced to register the 
contrast to which attention was called in an earlier paper (Fisher, 1938). 



86.247 


247 


B. A. PISHKB 


This case is reproduced by putting », = I, then the distribution of 0 is 


being the test of significance of Hotelling’s generalized “Student’s” ratio, as used in simple 
discriminant analysis. 

Again, in the case p = 2, the simultaneous distribution of the two roots is 


df 


— ^^^1 + ! (K- Jnjj- |) ! > 

(K -1)1 (K -1) 1 - 1)! (K- i)! 

X -i (! (1 _*(^1 - i)^)dd^dJd^ 


_K + Wa-2)! 


Mn,-2)l{n,-2) 


,ri 9*”^"* {I ~p 4- dpdq, 


where p is the sum, and q the product of the roots. For given q tlie limits of p are 2ylq and 
1 +^. Integrating with respect to p, we have 


(rti + na- 

4(%-2jr(Wa- 


-2^-\-qy”»-hlq 


(Wl + 7l2-2)! 
(n;::'2)!(7i2~2)! 




the curious result found by Wilkes in studying the distribution of the ratio of the so-called 
“generalized variance”, which is the product of the roots. The significance of such a product 
may thus be tested by a 2 -test. fSuch a test would, of course allow a largo root duo to some 
relevant causation to be obscured by the accident that the second root happened to be 
exceptionally small. It will generally be the largest root, or the largest of those doubtfully 
significant, which will need to be tested. 

An important section of the general distribution is the limiting form when is large. If 


n^-xx), ni = n 

and n^d-^2^, 

the simultaneous distribution of the p variates is seen to be 

jfkP 

(i^-Tj!.:. (in- ip- mip- 1)1 

X 6 -^ 1 -"- ^a)... (^P -1 - ^«) # 1 ... 

where ... <^i<oo. .(11) 

Apart from its own analytic interest, this form 8upi)lies a simple demonstration of the 
form of the function of p referred to in § 3. For let 


)(K-l)!:..(in-ip-i)!^ 








88.248 


STATISTICS FROM NON-LINEAR EQUATIONS 248 

when the integration extends over all admissible sets of Tallies; then if n » j} -f i 

(Jp-i)!... 6i J-w - ... (4>,.x-i>pWx-i4>p . (IS) 

Now substitute in the right-hand element 

= •=!.P-1: 

then we have the reonnenoe equation 
(ij.-i)l...(0)IPtp) 

-^(*p)i-(i)iJ’(P-i). 

Removing the common factors, it appears that 

j-(p)/j'(p-i)=i(te))i/(i)i 

“(iP-l)l/>- (13) 

Since also, when p = 1, F(p) » 1 » (-i)l/V^ 

it follows in general that 

F(p) = 7r-*P(ip-l)!...(-i)!, (14) 

the form given in § 3. 

Returning to equation (12) it is interesting to note that it may be written in the form 
je-*‘- --*r{^x-^t) ••• i^p-x-^pWx 

= nj(iP-ioi(ip-ii-i)i»r-* 

■= n(p-»)i 2 -<»-« 

i-i 

« 2-*«<*»-«(p-l)!(|)~2)l...(0)l. (16) 

SUMMAEY 

Hitherto little has been known of the distribution of statistics requiring the solution of 
non-linear equations. The test of significance of a selected harmonic, obtained some years 
ago, shows, however, that some complexity is to be expected in exact tests of significance 
for these. 

In the present paper the solution is given of the simultaneous distribution of the roots 
of certain quantio equations which arise in discriminant analysis. 







36.249 


249 R. A. FISHER 

REFERENCES 

R. C. Boss (1936). ‘*0n the exact dietribution of X)> statiatio.” SanJchyA, 2,148-M. 

R. C. Bosh k S. N. Roy (1936). **The exact distributioa of the Studentixed D* etatiatio.** SankJiya, 4 , 19-38. 
R. A. Fishxb (1916). “Frequency distribution of the values of the correlation coefficient in samples from an 
indefinitely large population.** Biometriha, 10, 6OT-21. 

-(1929). “Tests of significance in harmonic analysis.*’ Proe. Boy. Soc. A, 186, 64-9. 

-(1936). “The significance of regression coefficients.** Cowles Commission for Sesmrch Conference, Colorado 

College Publication, no. 208, pp. 63-7. 

-(1938). “The statistical utilization of multiple measurements.” Ann. Evgen., Land., 8, 376-86. 

H. Hotkixino (1933). “Analysis of a complex of statistical variables into principal components.** J. Bduc. 
Psychol. 24, 417-41 and 498-^20. 

-(1935). “The most predictable criterion.” J. Educ. Psychol. 86 , 139-42. 

-(1936). “Relations between two sets of variates.*’ Biometrika, 88 , 321-77. 

P. C. Mahalanobis (1930). “On tests and measures of group divergence. Part I. Theoretical formulae.” 
J. Asiat. Soc. Beng. 86, 641-88. 

-(1936). “On the generalized distance in statistics.” Proc. Nat. Inst. Sci. Ind. 12, 49-66. 

J. WiSHART (1928). “The generalized product moment distribution in samples from a normal multivariate 
population.*’ Biometriha, 80 A, 32-62. 



«7.13a 


37 

ON THE SIMILARITY OF THE DISTRIBUTIONS 
FOUND FOR THE TEST OF SIGNIFICANCE IN 
HARMONIC ANALYSIS, AND IN STEVENS’S 
PROBLEM IN GEOMETRIC PROBABILITY ” 


Reprinted from Annala of Eugenics^ Vol. X, Pt. I, pp. 14-17, 1940. 



87.14 


ON THE SIMILARITY OF THE DISTRIBUTIONS 
FOUND FOR THE TEST OF SIGNIFICANCE IN 
HARMONIC ANALYSIS, AND IN STEVENS’S 
PROBLEM IN GEOMETRICAL PROBABILITY 

By R. a. fisher 


I. Statement of the two probl?:ms 

In a recent note W. L. Stevens (193U) has published an ingenious solution of the following 
problem in geometrical probability; 

“On the circumference of a circle of unit length, n arcs, each of length x, are marked off 
at random. What is the probability that every point of the circle is included in at least one 
of these arcs? ” 

The expression for this probability at which iie arrives is 






in which Jc is the greatest integer less than l/x. 

It will be observed that the solution is strikingly similar to one at which I arrived, in 1929, 
for the apparently very different problem of testing the significance of the largest of the 
harmonic components into which a series of observations may be analysed. 

If a series Ui, W 2 «+i constitute a random sample from a normally distributed poimlation, 
the linear functions defined by 

A = S{a,u,), B ^ S{b^u^), 


in which 


«r 



2npr 
^2a + l’ 


br^ 



'Inpr 

2nVV 


represent the coefficients of a, and 6, in the harmonic expansion of u. 'fhe contribution of 
any particular period 2(w + l)/p is measured by 


x = A^ + B^ 


Values of p from 1 to n supply n values of x, such tliat 
i,{x) = S{u-uf. 

C=1 

The period showing the highest value of x may be tested for significance by considering 
the fraction of the total sum of squares for which it accounts; thus, if g is the largest of 
the fractions 


xjS{x), 




87.15 

15 R. A. FISHER 

it was shown that the probability of obtaining so great, or a greater, value, by chance is 

P - n(l -cr )*(1 +... + (_(I 

where k is the greatest integer less than l/g. 

In other words, the probability that the largest observed fraction is leas than x is identical 
with the probability found in Stevens’s problem. It is of some interest to elucidate this 
curious equivalence of the two problems. 


2. The equivalence of the problems 

The solution of the problem in harmonic analysis was derived from the simultaneous 
distribution 

dj = c-<*»+*t+-+*i*Vcda:i ...dx^c-*; 

the frequency density is constant over any plane finite region for which S{x) is constant, 
and X, ^ 0, for all values of r. These inequalities bound a series of generalized tetrahedra, 
so that the probleih is equivalent to: 

^^9v’t9n ^ fractions such that iS(gp) =; 1, and the frequency element is proportional to 

dgi, 

what is the probability that the largest fraction is less than any assigned value x^ The 
equivalence of this problem to that of Stevens is now readily demonstrated. 

Let stand for the fraction of the circumference of the circle (or other closed contour) 
through which any arc must be shifted in order to coincide with the next, taken in order 
round the circumference. Then 

8 ( 9 ) - 1 . 

Next, so long as neither g^. nor is mode to be negative, any one may be displaced, in 
such a way that g^+^r-i constant, and the frequency with which its value falls in any 
element within these limits, is proportional to dg^. Hence for simultaneous variation 
the frequency element is proportional to 

. -rffirn-i 

within the limits g^^O, S{g) = 1. 

Now the probability that the greatest fraction is less than or equal to x, is the probability 
that each fraction without exception is less than or equal to x, and if each arc is of length x, 
this is the condition that no gaps occur between two successive arcs. 


3. Stbvens’.s extension of the problem 

Stevens extends his solution to finding the probability that there shall be i gaps un¬ 
covered by the chosen arcs. He obtains 

J(1l)(l-(<+I)a!)»-> + ...±- 


/(•■) = 









57.16 

TEST OF SIGNIFICANCE IN HARMONIC ANALYSIS 16 

The equivalence established in § 2 shows that this expression will give also the probability 
that just t of the fractions g shall exceed x. 

The probability that i or more values shall exceed x is therefore 


But 






A (n-mHj-jV i! A/! ij-jV 


j!(n~j)!(j-i)!(i-l)! 


= __ 

Hence the probability of i or more gaps is 

V (_u-i 


When i s= 1, we thus have 



(l-2a;)«-i (n-l)(n-2)(l~3a;)"-^ 

' 2 ■ ' ~ 2 \ ~ ' “3 


-(ik-ijTiw-ik)! k r 


when t = 2, 

-ir-\— -‘3““ + -±(*-2)!(»-t)!- k -1 

and so on. 

The first of these merely verifies the test of significance for the largest of n fractions. The 
second may be used in a test whether the second largest is significant, such as might be 
useful if, when the largest is doubtfully significant, it may still be suspected that the two 
largest are due to some systematic causes. The second largest fraction would then be 
equated to x. The other cases in which the second largest component is of interest in the 
interpretation of a series are more complex, and can only properly be discussed in relation 
to the particular facts exhibited by the data, and the more plausible hypotheses involving 
periodic disturbances which happen to be in view. For example, if the two most important 
periods are adjacent, the evidence for reality will be stronger than would appear if they 
were tested by the formula above, and the same is true, in some material, when one large 
contribution pertains to a })criod nearly half that of another. Special distribution problems 
arise in considering the tests of significance appropriate in each case. 


I am indebted to Mr Stevens for the table of numerical values for testing the significance 
of the second largest harmonic component, with some extension of the values for the largest 
harmonic component given in my earlier paper. 




87.17 

17 


B. A. FISHER 


SUMMABY 

The identity of the solutions obtained for two problems apparently quite unconnected 
is shown to be no coincidence, but due to their intrinsic equivalence. 

The further formulae developed by Stevens are also relevant to one of the cases which 
may arise when harmonic components other than the greatest suggest the reality of periodic 
disturbances. 


A TABLE OF THE TEST OF SIGNIFICANCE IN HARMONIC ANALYSIS. 
W. L. STEVENS. 

Five per cent values of gi and the largest and second largest fractions 


n 

01 

«i 

9t 


3 

0*87090 

— 

0 - 43 S 4 S 

0*39863 

— 

4 

0*76792 

— 

— 

5 

0*68377 

0*61015 


0*36704 


0 

—^ 

0*34021 

— 

1 

0*56115 

— 

0*31729 

— 

0*51569 

— 

0*29751 

I 

9 

0*47749 

— 

0*28028 

3 

10 

0*44495 

— 

0*265x1 

8 

IS 

0*33462 

1 

0*21016 

33 

20 

0*27040 

6 

0*17547 

5 * 

*5 

0*22805 

8 

0*15139 

64 

30 

0*19784 

10 

0*13360 

70 

35 

40 

0 * 175*3 

0*15738 

12 

14 

0*11986 

0*10890 

74 

75 

45 

0*14310 

14 

0*09993 

75 

50 

013135 

14 

0*09244 

75 


Approximations to and may be found by using only the first terms of the respective series. 
These approximations are in excess of the correct values by amounts and listed in the table; 
the approximate values of g^ and are therefore gj + Si and 93 + ^2* errors are even less for smaller 
probabilities. At the i % level for n = 50, they are respectively i and 21 in the fifth place. The first 
term approximation is therefore usually adequate in the test of significance. 


REFERENCES 

W. L. Stevens ( 19 . 39 ). ^‘Solution to a geometrical problem in probability." Ann. Eugen., Land., 9, 316 - 20 . 
R. A. Fisher ( 1920 ). "Tests of signifioanco in harmonic analysis." Pmc. Roy. Soc. A, 125 , 54 - 9 . 




38.181a 


38 

THE NEGATIVE BINOMIAL DISTRIBUTION 


• Reprinted from AnnaU of Eugenics, Vol. XI, Pt. II, pp. 182-187, 1941. 



B.182 


THE NEGATIVE BINOMIAL DISTRIBUTION 

By R. a. FISHER. F.R.S. 

Although the algebra of the two oeuses is equivalent, the positive and negative binomial 
expansions play very different parts as statistical distributions. 

The positive binomial {i+pY 

occurs normally with n a known integer, but the fractions p and q= 1 —p, unknown. The 
case in which n also is unknown is conceivable, but rather artificial for the following reasons: 

If n is not integral the expansion after a certain stage develops negative coefficients; 
these cannot be interpreted as negative frequencies, so that the expansion does not corre¬ 
spond with any distribution. 

There remains the case in which n is necessarily integral, although unknown. A variety 
of problems may be constructed of this sort, all entirely academic. With a sufficiently large 
sample n is necessarily one less than the number of frequency classes, and is thus determined 
without reference to the actual frequencies. 

The negative binomial, on the other hand, which, following Haldane (1941), we may write 
{q “P)"*, 5 = l+p, fcpositive, 

gives on expansion the term g•* “ \ gj ’ 

which is positive for all positive values of ar, whether k is integral or not. Consequently, in 
this case, there is a practical problem in the simultaneous estimation of p and k to which 
the positive binomial offers no analogy. 

In experimental sampling the negative binomial with unknown exponent arises in a simple 
extension of the conditions which give rise to the Poisson Series. The Poisson Series arises 
when equal samples are taken from perfectly homogeneous material. It is completely 
determined by the average or exxiected number, m, of occurrences per sample. If unequal 
samples were taken, or if the material were not perfectly homogeneous, the value of m 
would vary from sample to sample. Since m is necessarily positive, the simplest frequency 
distribution which allows some variation of m is the Eulerian distribution, familiar as that 
of X** in which the frequency element is 

{k—l)\ 

For x^ Ibe parameter k is always the half of a positive integer; in general it may be any 
number exceeding zero. 

When m varies in this way the frequency of occurrence of x units in the sample is 

f . dm. 

Jo(*i-l)!^ *1 




R. A. FISHER 


88.183 

183 


This integral also is of the Eulerian t 3 rpe having the value 

a;!(Ar-i)! (1 

and this is identical with the standard form for the negative binomial. The variance of m 
always increases the variance of x for a given mean value, so that a positive binomial dis¬ 
tribution cannot be obtained in this way, for it would corresjKjnd with m having a negative 
variance. 

2. The efficiency of fitting the first two moments 
The binomial with known exponent is efficiently fitted by the observed mean; it is there¬ 
fore rational, and not inconvenient, to fit the negative binomial, using the first two moments. 
Jeffreys (1939) has pointed out that this process is not efficient. 

The expression for the moments of the negative binomial are equivalent to those for the 
positive binomial, changing the sign of p, and remembering that k corresponds to — w, 
and g = l+p. ^ ^ pg{q+p) k, 

Pi = pqk, Pi - 3/4 ==pq(l + fipg) k. 

Consequently, for large samples, for which case alone the method of moments need be 
investigated, we may use the equations of estimation 

nii—x , 
p = --— k= 

X mg—a? 

where x is the mean, and mg the variance as estimated from the sample. 

To examine the efficiency of the method wo shall need the determinant of the covariance 
matrix of p and k so estimated; this may be found as follows without determining the co- 
variance matrix itself. 

The covariance matrix of x and m 2 for large samples of N is in general 

l(/^a /*3 |. 

substituting for p and k, this gives the determinant 

ij pqk pq{q+p)^ 

pq(q+p)k pq{\ + Qpq)k + 2p^q^k^ 

=1). 

To derive from this the determinant cf the covariance matrix for the estimates of p and k, 
we need only multiply by the square of the Jacobian 

d(x, Wj) ’ 

writing for x and W 2 their expected values. The Jacobian is 
m 2 1 

x^ X — 1 

x) _ x^ m 2 -x' 



88.184 

184 


THE NEGATIVE BINOMIAL DISTRIBUTION 


or ~ 1 lp*k on substitution. The determinant of the oovarianoe matrix of p and k estimated 
by the first two moments is, therefore, 

2g»(ik+l) 

jpN* • 

We may compare this with the corresponding determinant for any.method of efficient 
estimation. 

The most convenient way of doing this is to calculate the information matrix, which will 
be the reciprocal of the covariance matrix for efficient estimation. 

Taking the general term of the negative binomial, 

r ^ ( fe+a;~l )l P* 

It appears that __logO = 

whence, substituting its mean value pk for x, we have 

• -A 

in accordance with the well-known fact that if k were given, p would be efficiently estimated 
from the observed mean. Next 

a* ^ 1 . 1 

Finall,, -|.logC = F(k- l)-5(fc+*-1) = + 

and this expression averaged for varying x gives in the form 


i ^ {k +x-l) \ ^ J_\ 

It is a curious fact that this awkward looking expression can be transformed into one 
suitable for the comparison we have in view. If 

r^pfq, l/g*l-r, 

and 




+ -2- 




I ^ k{k+lHk+2 )J 




r r* 4r® 

k 2k(ir+r) Qkik -I-1) (kT2) 


V ^ (^-l) Ufe-l)» 

(A:-l-a;-l)! 


In this form it is easily seen that the determinant of the information matrix 



A! V (a;-l)iikl 
pq «-*«?*(!;+«- 1 ) 1 ’ 


IS simply 




R. A. FISHER 


38.185 

185 


If the determinant of the covariance matrix corresponding with any method of estimation 
is multiplied by this expression, we have the reciprocal of the efficiency. For the method of 
moments I _ 4 p p^ 

In accordance with the general theory E is always less than unity. The expression shows 
that it is near to unity when p ^ 

q{k + 2) “ (^+ 

IS small. 

When the mean is small, e.g. ic = 0* 1, high efficiencies occur even when k is as low as unity, 
for which value the expression above is 1/33, low efficiencies are confined to the region 
W'here A;->0. At this extreme if A;> the value is less than 1/20. 

When the mean is 1*0, k must be as high as 3 for the value to fall to 1/20. 

When the mean is 10, k must be 9 for the value to fall to 1/20-9. 

However high x may be, values of k above 18 will bring this down below 1/20. 

Thus if p is less than 1/9 for any value of k, ot i£ k exceeds 18 for any value of p, high 
efficiency is assured; for intermediate values, if the product (H- 1/p) + 2) exceeds 20, the 

efficiency is satisfactorily high. 


3. Numerical examples 

Example 1. Table 1 gives a sample of sheep classified according to the number of ticks 
found on each. (Data due to A. Milne, King’s College, Newcastle-on-Tyne.) 

Table 1 


Number 

Number 



of ticks 

of sheep 

/(*-3) 

/(x-3)» 

(X) 

/ 



o 

7 

-21 

63 

I 

9 

-i8 

36 

z 

8 

- 8 

8 

3 

13 

- 

- 

4 

8 

8 

8 

5 

5 

lO 

20 

6 

4 

Z2 

36 

7 

8 

9 

3 

12 

48 

z 

6 

36 

lO 

2 

14 

98 

Total 

6o 

+ 15 

353 


Fitting by the first two moments we have 

X ^ 3*25, m, * s* = 349-26 4- 69 = 6-9194916, 





38.186 

180 THE NEGATIVE BINOMIAL DISTRIBUTION 

giving the estimates 

p = 0-821382, Ijp = 1-217460, k = 3-966746, 
and 13-21. 

From this we may guess the efficiency to be about 00 %. The actual terms in 1/E are 
1 

0-1009 

0-0147 

0-0027 

0-0006 

0-0001 

1-1190 E = 0-8937. 

With efficiency below 90 % many workers would think a more accurate fitting desirable. 
For this purpose the method of Haldane’s note in this number may be recommended. 

Example 2. As an example with a somewhat heavier rate of infestation we may take the 
series 

Table 2 


Ticks 

Sheep 

Ticks 

Sheep 

Ticks 

Sheep 

o 

4 

9 

a 

i 8 

— 

I 

5 

10 

2 

19 

1 

2 

IX 

II 

5 

20 

— 

3 

lO 

12 

— 

21 

I 

4 

9 

J3 

2 

22 

I 

5 

II 

X4 

2 

23 

I 

6 

3 

*5 

I 

24 

— 

7 

5 

i 6 

1 

as 

2 

8 

3 

*7 

— 

— 

— 





Total 

82 


Here £ » 638 -r 82 6-6609766, a* » 34-767841. 

The moment estimates are 

p = 4-299188, k = 1-626090 
and ^l+^j(ik+2) = 4-36. 

In this case no further calculation is needed to show that the method of moments is 
decidedly inefficient. 

The result of fitting this example by maximal likelihood gives, of course, somewhat 
different estimates 


Fitted 

by 

ist and and 
moments 

Likelihood 

P 

k 

4299x88 

1*526096 

3-69x175 

1777476 





R. A. FISHER 


38.187 

187 

but such differences would scarcely mislead one as to the level of efficiency. The efficient 
values obtained by likelihood merely give a value of (1 +1/^) (it + 2) of 4*80 in place of 4-35. 

The efficient solution does not give a bad fit, in spite of the abrupt charges in frequency, 
e.g. between I and 2 ticks, or again between 6 and 6, which the observed series shows. 
Grouping in eleven classes we have 


Nvunber 
of ticks 

Expected 
number of 
sheep (m) 

Observed 

Difference 

(a - »»)• 

m 

a 

a—m 

0 

5-256 

4 

-1*256 

0*3001 

I 

7-350 

5 

-2350 

0*7514 

2 

8032 

II 

+ 2*968 

1*0967 

3 

7-958 

10 

■f 2042 

0*5240 

4 

7-478 

9 

+ 1*522 

0*3098 

5 

6*799 

II 

+ 4*201 

2 5957 

6 

6043 

3 

“ 3-043 

1-5323 

7-8 

9-844 

8 

-1-844 

0-3454 

9-11 

9-990 

9 

- 0*990 

0*0981 

ia -15 

7-232 

5 

-2*232 

0*6889 

i6— 

6*018 

7 

+ 0*982 

0‘1602 

Total 

82*000 

i 82 

0 

' 8*4026 


Since, in addition to the total frequency, two parameters have been efficiently fitted, x* 
has eight degrees of freedom. The value of x^ is thus veiy near to its expected value. In spite 
of their apparent regularity the deviations are no larger than might often be due to chance. 


4. Summary 

The cases of the positive and negative binomial distributions, in spite of their algebraic 
similarity, are very different in their applications, and in the statistical problems to which 
they give rise. 

With the negative binomial we ordinarily require to estimate the exponent in addition 
to the mean of the distribution. 'Phis can be done from the first two moments, but the 
process has been recognized as inefficient, and in the present note the theoretical efficiency 
is calculated so as to make it easy to judge in practical cases whether a more exact fitting 
by maximal likelihood is required. 


REFERENCES 

J. B. S. Haldane ( 1941 ). ‘The fitting of binomial distributions.’ .< 4 nn. Eitgen., Lond.^ 11, 179 . 
H. Jeffreys ( 1939 ). Theory of Probability, p. 200 . Oxford: Clarendon Press. 






89.340a 


39 

THE THEORY OF CONFOUNDING IN FAC¬ 
TORIAL EXPERIMENTS IN RELATION 
TO THE THEORY OF GROUPS 


AUTHOR’S NOTE 

The first three sections of Paper o9, occupying less than three pages, 
give the general theory which is extended to more complex cases in 
Paper 40. The remainder of Paper 39 is mainly occupied by a cata> 
logue completing the solution of a gproup of problems which in the 
literature had received only rather unsystematic and adventitious 
treatment. 

Group properties are so abstract that the language in which the 
ideas of the theory are expressed requires the utmost care if the ideas 
are to be conveyed to the reader’s mind. Every effort has been made 
in these pages to make clear the necessary connection of each idea 
with those preceding. 

Practical workers with experiments involving confounding should 
gain familiarity through the practical approach developed in The 
Design of ExperimentSf which in the third edition illustrates the full 
use of factors at two levels. 


Reprinted from Annals of Jfugenies, Vol. XI, Pt. IV, pp. 341 - 853 , 1042 . 



39.341 


THE THEORY OF CONFOUNDING IN FACTORIAL EXPERI¬ 
MENTS IN RELATION TO THE THEORY OF GROUPS 

By R. a. fisher 
I. Introduction 

When the treatments in an experiment introduce all combinations of n factors, each at two levels, 
so that they are 2" in number, it has long been known that it is possible in many cases to divide 
each replication into two, four or more blocks in such a way that the contrasts between blocks 
constitute only such interactions between the primary factors as are of minor importance to the 
experimenter. The great advantage of limiting the size of the block lies in the fact that in this way 
the contents of each block may be made much more homogeneous than if it had been of larger size. 

In a very large class of experiments of this type w^e are concerned to evaluate the effects of each 
individual factor with high precision, and to discover whether any pair of the factors tested show 
an appreciable interaction. If in these respects the precision of the experiment can bo increased, 
it is usually advantageous to do so at the expense.of foregoing information as to the reality of 
one or more of the interactions involving three or more factors. 

Various systems of confounding, using factors up to six in number, have been discussed by 
Barnard (1936) and Yates (1937). In the present paper I propose to develop the connexion of 
the subject with that of Abelian groups, to prove a general proposition connecting the minimal 
size of block required with the number of factors involved, and to supply a catalogue of systems 
of confounding available up to fifteen factors. 

2. The group properties of treatments and comparisons 
A group may be formed, of which the elements are all the selections that can be made of none 
or more out of n letters. The order of the group is 2”. The product of any two elements is formed 
by combining the letters they contain, deleting any they may have in common. The group is, 
therefore. Abelian. 

As is well known, the treatments of a factorial experiment correspond one to one with the 
elements of this group. Any treatment combination chosen as control is designated by the symbol 
(1), while the symbols for all other treatment combinations are designated by the letters repre¬ 
senting the factors in which they differ from the control. Thus (o6c) stands for the treatment 
differing from the control in factors A, B,C, and agreeing with it in all other factors. 

Every element of the group, other than the identity, divides the whole into two halves, having 
respectively even and odd numbers of letters in common with that element. The identity always 
belongs to the even half. The elements other then the identity thus each correspond with a 
particular comparison in which half of the treatments tested is contrasted with the other half. 
The comparisons of particular interest to the experimenter comprise those due to single factors, 
n in number, represented by single letters. A, B, D, ... and to the interactions of these, two 
at a time, AB, AC, BC, etc. We shall be concerned that these shall not be confounded. 

Now, if a treatment has an even number of letters in common with a contrast and also an 
even number of letters in common with Sj, it must equally have an even number of letters in 
common with the product SiS^. Consequently the contrasts writh which it has an even number of 
letters in common constitute, with the identity, an entire subgroup. The halvings corresponding 



S9.342 

342 


CONFOUNDING IN FACTORIAL EXPERIMENTS 


to any two elements are orthogonal, i.e. if the letter A occurs in 8i, but not in then if g 
is the aggregate of elements even for both, Ag is an equal aggregate even only for the second. 
Consequently, the number of elements even for all members of a subgroup of order is 2“”^, 
and these must form a subgroup. 

In this connexion it may be noted that the number of subgroups of order 2** is equal to the 
number of order 2"”“, and is n-a ~ 1 


If there are 2*^ blocks it is obvious that 2^—1 interactions will be confounded, and that these 
with the identity will form a subgroup of order 2®. 

Let C be any interaction confounded, and let B be the operator needed to change any treatment 
into any other treatment occupying the same block, then since C is a contrast between whole blocks, 
any two members of the same block must have either both an odd or both an even number of 
letters in common with C. Hence B has an even number of letters in common with C. 

There are only 2'*~® operators in the subgroup having an even number of letters in common 
with all confounded interactions. Hence any treatment when operated on by this subgroup of 
operators yields the set of treatments in the same block. This subgroup of operators is therefore 
called the intrablock subgroup. 

When the number of factors is large, the number of plots in each block may be much less than 
the number of blocks. Hence it is easier to specify the intrablock subgroup than the subgroup of 
confounded interactions, which may bo derived from it. If in any solution we desire to omit any 
factor, say X, without diminishing the size of block, we may always do this by omitting the 
symbol x from the specification of any element of the intrablock subgroup in which it may 
occur. At the same time all elements oontainii^ X of the subgroup confounded are deleted, 
so dividing its order by two, and halving the number of blocks required to carry out the experi¬ 
ment. These operations do not introduce any new interaction among those confounded; conse¬ 
quently, if, in the original solution, no interaction involving less than three factors was confounded, 
this will still remain true after any chosen factor has been omitted. 

3. The relation between the number of factors and the necessary size op the block 
It is a remarkable fact that so long as the number of units in a block exceeds the number of factors 
used, arrangements can be found such that no interaction confounded involves less than three 
factors. Thus blocks of 8 may bo used with so many as 7 factors, or 128 treatment combinations, 
so that the number of blocks in the replication would bo 16, and blocks of 16 may bo used with so 
many as 16 factors, or 32,768 treatment combinations, there being then 2048 blocks in a replica¬ 
tion. The general proof of this proposition may be developed as follows: 

To arrange the combinations of 2'^ — 1 factors in blocks of 2” units each; first establish a 1:1 
correspondence between the factors, or the letters by which they are represented, and the elements, 
other than the identity, of a group of order 2”. For example, using Greek letters for this latter 
group we might have for 16 factors: 


A 

a 

F fiy 

L 

otfid 

B 


0 afiy 

M 

yd 

C 

OLfi 

H 6 

N 

ayd 

D 

y 

J od 

0 


E 

ay 

K fid 

P 

afiyd 


Corresponding with each Latin letter, or with each element other than the identity of the 



39.343 

R. A. FISHER 343 

Greek group, there is the subgroup of order 2 to which that element belongs, and the subgroup of 
order 2" or in this case 2*, all of the elements of which have an even number of Greek letters in 
common with it. 

The remainder of the Greek group is a set of 2”-^ (eight) elements, not including the identity, 
and the corresponding set of Latin letters will be chosen to specify a treatment in the same block 
as the control treatment. 

Every Latin letter of the set of Latin letters which corresponds with the interaction of any two 
elements of the Greek group, contains an odd number of Greek letters in common with this inter¬ 
action, and therefore an odd number of Greek letters in common with one but not both of the 
elements of which it is the interaction. The set is therefore constituted of the Latin letters which 
appear in one but not in both of the sets corresponding with these elements. It is therefore their 
interaction. Hence the 2“ — 1 sets with the identity form a group. This is chosen as the intrablock 
subgroup of the group formed by all combinations of the Latin letters. 

If S and T are any two Latin letters, and ^ any Greek letter which occurs in the representation 
of one of them, but not of both, then the set of Latin letters corresponding with g contains all of 
which the representations contain and therefore includes one but not both of the letters S and T. 
Hence the interaction ST is not confounded. 

Evidently, also, the main action T is not confounded, for T will occur in the set corresponding 
with any single Greek letter which the Greek representation of T may contain. 

Hence with only 2”’ treatments in each block, we may test 2" - 1 factors and all their interactions 
without confounding any interaction of less than three factors. 

The same argument provides the means of identifying the interactions which are confounded. 
For, if 8TU.., is any interaction, and if the product of the Greek representations corresponding 
with the letters it contains involves the letter then ^ must be involved in an odd number of these 
letters; consequently the interaction has an odd number of letters in common with the set corre¬ 
sponding with and is therefore not confounded. The interactions confounded are those, and 
those only, for which the product of the Greek representations of the Latin elements involved 
reduces to the identity. 

4. Extension to cases in which bach factor occurs at more than two levels 
If each factor is tested at p levels, where p is a prime number, the same for all factors, the 
correspondence with the theory of Abelian groups remains and many of the same conclusions follow. 

The chief diflFerence arises from the fact that each factor is now associated, not with a single 
comparison, but with a cycle of p -1 independent comparisons, represented by the powers, 
excluding zero, of a single symbol. The interactions of any two factors will then constitute p — 1 
such cycles, representable by the products of theE» powers. Whereas in the case of two levels the 
interaction of such comparisons as A B and BC will never involve the factor B, with more than two 
levels, if we consider the interactions of AR and BC, it appears that only one of the interaction 
cycles will be free from B and involve only the two factors A and C, while the remainder will 
involve all three factors. It should be noted that the number of factors involved is never less than 
the number in the case of experimentation at two levels, though it may be more. 

Consequently, if we experiment with thep** treatment combinations of n factors, arranging the 
treatments in blocks of p® units in such a way that p"~“ — 1 treatment comparisons are confounded, 
we shall have a subgroup confounded of order p®"®, and an intrablock subgroup of ord«r p®. 



39.344 

344 CONFOUNDING IN FACTORIAL EXPERIMENTS 

It appears on consideration that if ... is any one of the comparisons confounded, and 

A^B^C* is a member of the intrablock subgroup, or a treatment in the same block as the control, 
ix + jy ■\-1cz+... — O (modp). 

Given the subgroup confounded, the intrablock subgroup may then be constructed as in the case 
p - 2. Further, any of the solutions available when p = 2 will be available in general, with the 
assurance that no interaction will involve fewer factors than the corresponding interaction when 

p = 2. 

Consequently, no interaction of less than three factors need be confounded if 2® exceeds n. 
There will, however, be some variety in the numbers of interactions of different orders con¬ 
founded, according to which set is chosen. Thus, with 7 factors arranged in blocks of p* units, we 
might choose for confounding a subgroup of order p^ derivable from the generators 
ABC ADE AFO BI)*FK 
The cases in which the 0, 1 or 2 of the relations 

l-l-t^p, l-l-j=p, i+j=p, l+i+j=p 

is or are satisfied, lead in general to three different solutions, though, when p = 3, only two of 
these can occur, since i and j can only take the values 1 or 2. 

The distribution of the forty cycles of two interactions each, confounded in these two cases, 
according to the numbers of factors involved are as follows: 


No. of factors 

i=i= I 

Tliree other cases 

Total no. of cycles 

1 

2 

_ 

~ 

7 

4 * 

3 

S 

6 

140 

4 

*5 

11 

280 

5 

9 

*5 

336 

6 

8 

4 

224 

7 

3 

4 

64 

Total 

40 

40 

1093 


The solutions possible seem, however, to be numerous. 

5 . Seven factors 

The solution for 7 factors at two levels, tested in blocks of 8, follows directly from the general 
theorem proved in § 3. The intrablock group contains, in addition to the control, the 7 treatments 
{abdg) {abef) {acdf) {aceg) (bcde) {bcfg) {defg). 

It may be remarked that these form a solution of the problem of arranging 7 treatments in 
balanced incomplete blocks of 4 units each. However the matter be viewed, there are 30 such 
selections possible, and of these 8 have no interaction in common with any chosen one. For that 
set out above these are: 

{abed) {abeg) {acfg) (adef) {beef) {bdfg) {edeg) 

{abed) {abfg) {aeef) {adeg) {bceg) {bdef) {cdfg) 

(ofccc) {abdf) {acfg) {adeg) {bedg) {befg) {edef) 

{ahee) {abfg) {aedg) {adef) {bedf) {bdeg) {eefg) 

{abef) {abeg) {aede) {adfg) {bedg) {bdef) {eefg) 

{abef) {abde) {aedg) {aefg) (bceg) {bdfg) (edef) 

{abeg) {abdf) {aede) {aefg) {beef) {bdeg) {cdfy) 

{abeg) {abde) {aeef) {adfg) {bedf) {befg) {edeg) 




39.345 

R. A. FISHER 346 

If it were possible to construct a balanced set, using only 6 replications, each of the 36 selections 
of 4 letters would have to occur in only 1 replication. Since, however, every pair of 8 sets listed 
above has one element in common, a balanced set in 6 replications is not possible. 

Balanced sets do, however, exist, both in 10 and in 15 replications. One type of solution for 10 
replications is obtained by observing that the 28 treatments common to the 28 pairs of the 8 sets 
listed are all different, so that we have only to rcjieat the first solution twice, and each of the other 
solutions having no element in common with it once each, to obtain a set of 10 in which JLojcA, 

iivL 5$ C^^CJUUV^ tuui^ o»o tkt . 

This equalization also extends to the interactions confounded, for 7 of these are represented by 
the same symbols as the 7 treatments of the control block, while the remaining 8 are respectively 
the interaction of all 7 treatments and the 7 interactions of 3 factors complementary to those 
already used. 

Obviously, there are 30 balanced sets of the type set out above. A second system, of which 
there are 262, is illustrated below, where we specify the 3-facior interactions confounded: 


ABE 

ACD 

AFG 

BCO 

BDF 

CEF 

DEG 

ABE 

ACD 

AFO 

BCF 

BDO 

CEG 

DEF 

ABC 

ADF 

AEO 

BDE 

BFO 

CDO 

CEF 

ABC 

ADO 

AEF 

BDE 

BFO 

CDF 

CEO 

ABO 

ACE 

ADF 

BCD 

BEF 

CFO 

DEO 

ABF 

ACE 

ADO 

BCD 

BEG 

CFG 

DEF 

ABD 

ACF 

AEO 

BCO 

BEF 

CDE 

DFG 

ABD 

ACO 

AEF 

BCF 

BEO 

CDE 

DFO 

ABO 

ACF 

ADE 

BCE 

BDF 

CDG 

EFG 

ABF 

ACO 

ADE 

BCE 

BDO 

CDF 

EFO 


Of balanced sets of 16 rei)lication8, three types may be mentioned, one of which is unique in 
that it is unaltered by any even permutation, so that there are only two solutions in the set, of 
which the second is, of cour.se, constituted by the 16 solutions not used in the first: 


ABC 

ABC 

ABC 

ABF 

ABO 

ABD 

ABE 

ABE 

ABO 

ABD 

ABF 

ABO 

ABD 

ABF 

ABE 


ADE 

ADF 

ADO 

ADE 

ADE 

ACE 

ACD 

ACF 

ACD 

ACO 

AGE 

ACE 

ACF 

ACD 

ACO 


AFG 

AEG 

AEF 

AGO 

ACF 

AFO 

AFO 

ADO 

AEF 

AEF 

ADO 

ADF 

AEO 

AEG 

ADF 


BDF 

BDO 

BDE 

BCE 

BCD 

BCG 

BCF 

BDF 

BDF 

BCF 

BCD 

BCF 

BCF 

BCO 

BCD 


BEG 

BEF 

BFG 

BDO 

BEF 

BEF 

BDO 

BCO 

BCE 

BEO 

BEO 

BDE 

BFO 

BDE 

BFO 


CDG 

CDE 

CDF 

CDF 

CEO 

CDF 

CEO 

CDE 

CFO 

CDE 

CFO 

CDO 

CDO 

CEF 

CEF 


CEF 

CFO 

CEO 

EFG 

DFG 

DEO 

DEF 

EFO 

DEO 

DFO 

DEF 

EFO 

DEF 

DFG 

DEG 



39.346 

346 CONFOUNDING IN FACTORIAL EXPERIMENTS 

A second set, this time of 70 solutions, may be formed by replacing the first three solutions used 


above by 

ABC 

ADE 

AFO 

BDO 

BEF 

CDF 

CEO 


ABC 

ADO 

AEF 

BDF 

BEG 

CDE 

CFO 

ABC ADF AEG 

while a third system of balanced replications 

BDE BFO 
is provided by 

CDO 

CEF 


ABC 

ADE 

AFO 

BDF 

BEG 

CDO 

CEF 


ABC 

ADF 

AEG 

BDO 

BEF 

CDE 

CFO 


ABC 

ADO 

AEF 

BDE 

BFO 

CDF 

CEO 


ABF 

ADE 

ACG 

BCE 

BDO 

CDF 

EFO 


ABO 

ADE 

ACF 

BCD 

BEF 

CEO 

DFO 


ABF 

ACE 

ADO 

BCG 

BDE 

CDF 

EFO 


ABD 

ACF 

AEG 

BCG 

BEF 

CDE 

DFO 


ABE 

ACF 

ADO 

BCD 

BFG 

CEO 

DEF 


ABO 

ACD 

AEF 

BCD 

BCE 

CFO 

DEO 


ABD 

ACG 

AEF 

BCF 

BEG 

CDE 

DFO 


ABF 

ACD 

AEG 

BCE 

BDO 

CFG 

DEF 


ABO 

ACE 

ADF 

BCF 

BDE 

CDG 

EFO 


ABE 

ACD 

AFO 

BCG 

BDF 

CEF 

DEO 


ABD 

ACE 

AFO 

BCF 

BEG 

CDG 

DEF 


ABE 

ACG 

ADF 

BCD 

BFG 

CEF 

DEO 


It may be that there are other balanced arrangements beyond these three listed. 

6 . Methods op confounding eight to fifteen factors 
IN blocks of sixteen plots 
Eight factors 

There are five arrangements with 8 factors, in addition to the one mentioned under (j) below. 

(a) The first example possesses some remarkable features. We derive aU blocks from that 
containing the control, i.e. 

(/) {aceg) {bceh) {edef) 

(abed) (acfh) (befg) (edgh) 

(abef) (adfg) (bdeg) (efgh) 

(ahgh) (adeh) (bdfh) (abcdefgh) 

The corresponding interactions constitute a subgroup of order 10, the intrablock subgroup. This 
is a group of operators which transform any treatment into one occurring in the same block. 
These operators are therefore 

/ ABCD ABEF ABQH 

ACEG ACFH ADEH ADFG 
BCEH BCFG BDEG BDFH 
CDEF CDGH EFGH ABCDEFGH 

The interactions which are confounded (including the identity) always constitute a subgroup, 
each of the elements of which has an even number (including zero) of symbols in common with 
every element of the intrablock subgroup. Since in this case every element contains a number 
divisible by 4, and consequently has an oven number of symbols in common with every other, 



R. A. FISHER 


89.347 

347 


while the subgroup is of order 2* out of 2* in the entire group, it follows that the confounded 
subgroup is identical with the intrablock subgroup. 

The remaining blocks are derived from the first by applying operators not belonging to the 
intrablock subgroup. In particular, the contents of the block containing any treatment may be 
obtained by using the corresponding operator. 

The arrangement chosen is unaltered by a group of permutations such as 

{ah) (al) (e/) {gh), 

(ac) (hd) (eg)(fh), 
iae){hf) {cg)(dh), 

(M (/l7)> (bee) (dgf), etc. 

which generate the permutation group representing the symmetry of a perfect cube. This group 
is of order Consequently we may make 8!/48 = 840 selections not mutually related by this 
group. These may be generated by choosing one of the 36 ways of dividing eight objects into 2 
sets of 4, representing the alternate apices of the cube, and then choosing one of the 24 one to 
one correspondences between the two sets of objects. 

Any solution contains fourteen interactions of 4 factors, consisting of seven pairs. Any one of 
these pairs may be taken to represent sots of alternate apices. Consequently, the solution may be 
obtained from each of 7 of the 35 subdivisions. In fact, in addition to those representing the 
symmetry of the cube the permutation of 7 letters (bfcehgd) also leaves the solution unchanged. 
Further, within each of these seven subdivisions 4 of the 24 correspondences give the same solu¬ 
tion. For example, if in the solution above we regard (adfg) and (bceh) as representing alternate 
apices, the permutation group of order four containing {af){dg), {ag){df), and {ad){fg) leaves 
the example unaltered. 

There are thus only 30 different solutions, corresponding with the 30 solutions for 7 factors in 
blocks of 8, the confounded subgroup of which can be derived from that of the present example 
merely by deleting the letter H. 

In this case only interactions of 4 factors are confounded. The 70 interactions of 4 factors may 
all be confounded equally, i.e. in one-fifth of the replications, by using 10 or 15 replications as 
in the analogous solution with 7 factors. 

(6) The intrablock group is 

I ABCD BCFG ABDFII 
BGH ABEF BDEG ACDGH 
CFH ACEG CDEF AEFGH 
DEH ADFG ABCEH BCDEFGH 
The group of confounded interactions is 

/ BCEH CDEF ABEFH 
ABG BCFG CDGH ACEGH 
ACF BDEG EFGH ADFGH 
ADE BDFH ABCDH ABCDEFQ 

These subgroups are now distinct, though for 8 factors in blocks of 16 plots they are necessarily 
of the same order. They have a common subgroup of order 4 containing the interactions BCFG, 
BDEG, CDEF. 

The solution is unaltered by the permutations (6c) (fg), (bed) (egf), (bg) (cf), a group of order 24. 
Hence there are 1680 solutions in the set. 



39.348 

348 CONFOUNDING IN FACTORIAL EXPERIMENTS 


(c) The intrablock block is 


7 

DEF 

BCDH 

ACDFO 

AEH 

ABCF 

BEFO 

BCEFH 

BDO 

ACEO 

ABODE 

ABDEOH 

COH 

ADFH 

ABFOH 

CDEFOH 

The group of confounded interactions 



7 

BDF 

CEFH 

BCDEH 

ACH 

ABDE 

DEOH 

BEFOH 

AEF 

ABOH 

ACDEO 

ABCDFH 

BCG 

CDFO 

ADFOH 

ABCEFO 


The solution is unaltered by the group of order 8 generated by (o6) (de) (grA), {cj){dg)i,eh), 
(ac) (6/) (e^); the number of solutions is 6040. 

(d) The intrablock group is 

7 ABEG BCFQ ACDQH 
EH ABOH ABDEF BCDEH 
BCD ACEF ABDFH DEFOH 
DFQ ACFH ACDEO BGEFOH 

Unlike the group of confounded interactions this may contain interactions of two factors. The 
group of confounded interactions is 

I BDO BCEH CDEGH 
ABC CDF BCFO ABDEOH 
AEH ABDF EFOH ACDEFU 
AFG ACDG BDEFH ABCEFQH 

The one interaction BCFO is common to the two subgroups, the solution is invariant for a per¬ 
mutation group of SmIS * 8 generated by {eh), (be) (fg), (bf) (eg); there are therefore 6040 solutions 
of this type. 

(c) The intrablock group is 

7 ACFH BCEF CDEH 
FG ACOH BCEO ABCDFO 
ABCD ADEF BDFH ABEFGH 
ABEH A DEO BDOH CDEFOH 

The group of confounded interactions is 

7 BDE CDEH ADEFG 

ACE ABCD CDFG BCEFO 

ADH ABEH EFOH BDFOH 

BCH ABFO ACFOH ABCDEFOH 

There is a common subgroup of order 4. 

The group of permutations for which the solution is invariant is generated by 

(fg), (o6)(cd). {ab){€h), {ac){bd), {ace){bdh). 

This is a group of order 48, so that there are 840 solutions of this type. 



R. A. FISHER 


39.349 

349 


Niru factors 

With 9 factors also five types of solution are possible. 

(/) Intrablock group 

/ emu ,ABCFG ACEQJ 

BCHJ CEFH ABDFH ADEUJ 

BDGJ DEFG ABEGII AFGUJ 

BEFJ ABODE ACDFJ BCDEFGHJ 

Group of confounded interactions 

/ . BDFIi DEFG ACFGJ 


ABJ 

BDGJ 

DEUJ 

ADFHJ 

ACH 

BEFJ 

FGHJ 

AEGHJ 

ADG 

BEGH 

ABCm 

ABCDGHJ 

AEF 

CDFJ 

ABCEF 

A BCEFHJ 

BCDE 

CDGH 

ABDFH 

ABDEFGJ 

BCFG 

CEFH 

ABFGH 

ACDEFGH 

BCHJ 

CEGJ 

ACDEJ 

BCDEFGHJ 


The common subgroup is of order H. 

If we omit the factor A we delete all interactions containing A from those confounded, and 
delete the letter A from all containing it in the intrablock group. The solution for 8 factors so 
obtained is of type (a). There are therefore only 270 solutions of this type. Omitting any of the 
other letters leads to a solution of type (6). 

{g) Intrablock group 

/ AEFG CDEJ ABDEGH 

ABFJ BODG CFGH ACDFGJ 
ACEH BDFII ABCDEF BCEFHJ 
ADHJ BEGJ ABCGHJ DEFGIIJ 
Group of confounded interactions 

/ ABGH ABCm CFGHJ 

AEJ ACDF ABC EH ABDFHJ 

AFH ACGJ ABCFJ ABEFGJ 

BDJ BCEF A DEFG ACDEHJ 

BFG BCHJ ADGHJ ACEFGH 

CDH DEGH BDEFH BCDEGJ 

CEG DFGJ BEGHJ BCDFGH 

ABDE EFHJ CDEFJ A BCDEFGHJ 

The common subgroup is of order 8. Deleting any letter reduces the solution to (c); there are 
therefore 6040 solutions. One-fourteenth of the interactions of 3, 4, 6 and 6 factors is confounded 
in addition to the interaction of all 9 factors. 

(h) The intrablock group is 

/ ACFQ ACmJ ABDEFG 

DFJ BCDE ACEFH ACDEHJ 

EGH ABDFH BCDQH BCFGHJ 

ABHJ ABEGJ BCEFJ DEFGHJ 



39.350 

350 


CONFOUNDING IN FACTORIAL EXPERIMENTS 


The group of confounded interactions may be generated from ABCy AFJ, AOH, BDJ, and 
BEH. It contains 7 interactions of three factors, 9 of four, 6 of five, 6 of six and 3 of seven. The 
letters A , B and G occur in none of the triple interactions of the intrablock group; deleting any 
of these we have solution (c). Deleting the other letters gives (d) in each case. 

(t) The intrablock group is 

/ BCFQ ABEFJ BDEHJ 

EOH BDOJ ACEOJ ABCDGH 
AGHJ CDFJ ADEFQ ABFQHJ 
ADFH ABODE BCEFH CDEFGHJ 

Deleting E we have solution (e); G and H give (d); A and B give (6), and C, D, F and J give (c-). 

(j) The intrablock group is 

I DEGJ ACDGH BCGHJ 
FH ACEFJ BCDEF ABDFHJ 
ABDJ ACEHJ BCDEH ABEFGH 
ABEG ACDFG BCFGJ DEFGHJ 

Deleting C gives solution (e); the six letters A, B, D, E, G and J give (d). 

There are 8 interactions of three, 10 of four, 4 of five, six and seven, and 1 of eight factors. 

The group of confounded interactions is generated by ABC, CDJ, CEG, CFH, and ADE. 
Since FH is one of the intrablock interactions, all confounded interactions contain either both 
or neither of these letters, hence if either of these letters is deleted, the other also disappears 
from the group of confounded interactions. There are thus 15 confounded interactions involving 
only the other seven factors, the solution being derivable from that of seven factors in blocks 
of eight merely by splitting each plot for the eighth factor. This really constitutes a sixth method 
of arranging eight factors in blocks of sixteen plots. 

Ten factors in blocks of sixteen plots 

There are four solutions for ten factors. 

(k) The intrablock group is 

/ ADFGJ BCFHJ ABCDGH 

CDJK ACEHK BDEGK ABEFJK 

EFQH ACFGK BDFHK ABGHJK 

ADEHJ BCEGJ ABCDEF CDEFGHJK 

The 63 confounded interactions may be generated from ACJ, ADK, AEG, AFH, BCK and 
BEH. There are 8 of three factors, 18 of four, 26 of five, 8 of six and seven, and 5 of eight. 
Deleting A or B gives solution (/); the other eight factors lead to solution (»). 

(l) The intrablock group is 

I ABFGK BDFGJ ACDFHK 

ADJK ACEGJ CDEGK BCEFJK 

BEHK AEFGH ABCDEF ABCGHJK 
CFHJ BCDGH ABDEHJ DEFGHJK 

The confounded interactions are those of the group generated from ADG, AFJ, AEK, BEG, 
BFH and CEH. There are 9 interactions of three factors, 16 of four, 16 of five, 12 of six, 7 of 
seven, 3 of eight, and 1 of nine factors. Deleting A, B,C, D, E ox F gives solution (*)> deleting 
H, J or K gives solution (h), deleting G gives solution (g). 



R. A. FISHER 


89.351 

361 


(m) The intrablock group is 

/ ABEOJ BCFQK ACEFJK 

FHJ ACDOH ABDFHK BCDEHJ 

DEOK ACEHK ABEFOH BCOHJK 

ABDJK BCDEF ACDFQJ DEFGUJK 

The 63 interactions confounded are generated from ABC, ADE, AGK, AHJ, BDG and BFJ. 
There are 10 of three factors, 16 of four, 12 of five and six, 10 of seven, and 3 of eight. Deleting 
F, H, or J leads to solution {j) \ D, E, G and K give (A), A, B, and C, give (»). 

(w) The intrablock group is 

/ DEFH ABFGHJ BCDEFQ 

ABEK DGJK ACDEUJ BCDHJK 
ACFJ ABDEGJ ACDFGK BCEFJK 
BCGH ABDFHK ACEGHK EFGHJK 

The interactions confounded may be generated from the six triple interactions ABC, AEF, 
AJK, BEH, BGK and CFH. There are 10 of three factors, 16 of four, 12 of five, 16 of six, 10 
of seven, and 1 of all ten. Deleting any one letter leads to solution (A). 


Eleven factors in blocks of sixteen plots 
There are three solutions for eleven factors. 

(o) The intrablock group is 

I BEOHK ABEFJK BCFGKL 

ADEKL CDGJK ACDFHK ABCDEFG 

AFGHJ CEFHL ACEGJL ABCIIJKL 

BDFJL ABDGHL BCDEHJ DEFGHJKL 

There are 127 interactions confounded; these may be generated from the seven triple inter¬ 
actions ADJ, AEH, AFL, AGK, BDK, BEL and CDL. There are 12 interactions of three 
factors, 26 of four, 28 of five, 24 of six, 20 of seven, 13 of eight and 4 of nine factors. Deleting 
A, B and C produces solution (A), D, E, F, G, H, J, K and L give solution (Z). 

( p) The intrablock group is 

I ABEGJL ACEGHK BCEFJK 

DGJK ABFGHJ ACFGKL BCHJKL 

EFHL ABDEKL BCDEFG ACDEHJ 

ABDFHK ACDFJL BCDGHL DEFGHJKL 

The group of confounded interactions may be generated from ABC, ADG, AEF, AHL, AJK, 
BDJ and BFL. There are 13 interactions of three factors, 26 of four, 24 of five and six, 26 of 
seven, 13 of eight and 1 of all eleven factors. Deleting A, B or C gives solution (A), the eight 
other letters lead to solution (m). 

(g) The intrablock group is 

/ BCGHL ACDEHJ ABDEGJL 

DGJK DEFHL ACEGHK ACDFGKL 
ABEKL ABDFHK BCDEFQ BCDHJKL 
ACFJL ABFGHJ BCEFJK EFGHJKL 



89.352 

362 CONFOUNDING IN FACTORIAL EXPERIMENTS 

The confounded interactions may be generated from ABC, AEF, AHL, AJK, BEH, BOK, 
and CFH. There are 13 interactions of three, 26 of four and five, 27 of six, 23 of seven, 10 of 
eight, 3 of nine and 1 of ten factors. Deleting A, B, C, E, F or H leads to solution (/), D, 0, J 
or X to (m), while deleting L leads to (n). 


Tiodve factors in blocks of sixteen plots 
With twelve factors there are only two solutions. 

(r) The intrablock group is 

I ACFHJL BCFOKL CDOHLM 

ABEFLM ADFQJM BDEGJL ABCDEFOH 

ABOHJK ADEHKL BDFHKM ABCDJKLM 

ACEQKM BGEHJM CDEFJK EFOHJKLM 

The 266 interactions confounded may be generated from the eight DEM, AEJ, AFK, AQL, 
AHM, BEK, BOM and CEL. There are 16 interactions of three factors, 39 of four, 48 of five, 
six of seven, 39 of eight, 16 of nine and 1 of all twelve factors. Deletion of any letter leads to 
solution (o). 


(s) The intrablock group is 

/ 

ABOHJK 

BCFOKL 

ACDOHLM 

DEHKL 

ACEOKM 

A BDEGJL 

BCDEFOH 

DFOJM 

ACFHJL 

ABDFHKM 

BCDJKLM 

ABEFLM 

BCEHJM 

ACDEFJK 

EFOHJKLM 


The group of hiteractions confounded is that generated by ABC, AFG, AEH, AJM, AKL, 
BEK, BFJ and GEL, There are 17 interactions of 3 factors, 38 of four, 44 of five, 62 of six, 
64 of seven, 33 of eight, 12 of nine, 4 of ten and 1 of eleven factors. Deleting A, B or C gives 
solution (o), deleting D gives {p), while the other eight letters all lead to solution (g). 


Thirteen factors 

With 13 factors we have one solution. 

(t) The intrablock group is 

I HJKLMN ABEFJKN ACEGJLN 

BCDEHJ ABDOHLN ACDFJLM BCDEKLMN 
BCFOKL ABDOJKM AGDFHKN BCFGHJMN 
DEFOMN ABEFHLM ACEQHJM DKFOHJKL 
The 611 interactions confounded by this arrangement may be generated from nine chosen 
from them, such as ABC, ADE, AFG, AHJ, AKL, AMN, BDF, BHK and CDO. There are 
22 interactions of three factors, 66 of four, one-thirteenth of the total in each case, 72 of five, 
96 of six, 116 of seven, 87 of eight, 40 of nine, 16 of ten, 6 of eleven, and 1 of twelve factors. 
Deleting the factor A we have solution (r); deleting any of the other twelve letters leads to 
solution (s). 



R. A. FISHER 


39.353 

353 


FourUm factors 

With 14 factors we have the one solution. 

(w) With the intrablock group 

I ACEOJLN HJKLMNO ACEOHKMO 

ABDGJKM BGDEHJO ABDGHLNO BCDEKLMN 
ABEFHLM BGFQKLO ABEFJKNO BGFGHJMN 
AGDFHKN DEFQMNO AGDFJLMO DEFOHJKL 
The 1023 interactions confounded may be generated from ABG, ADE, AFG, AHJ, AKL, 
AMN, BDF, BHK, BMO and GDG. They comprise 28 interactions of three factors, 77 of four, or 
one-thirteenth of each of these classes, 112 of five, 168 of six, 232 of seven, 203 of eight, 112 of 
nine, 66 of ten, 28 of eleven and 7 of twelve factors. Deletion of any letter leads necessarily 
to solution {t). 

Fifteen foAitors 

With 16 factors we have the complete solution from which all the others have been derived 
by deletion; the intrablock group is 

(v) I ABEFJKNO AGEGJLNP BGFGKLOP 

ABDGHLNO AGDFUKNP BCDEHJOP DEFGIIJKL 
ABDGJKMN AGDFJLMO BGDEKLMN DEFQMNOP 
ABEFHLMP AGEGHKMO BGFGHJMN HJKLMNOP 
The 2047 interactions confounded may be generated from ABG, ADE, AFG, AHJ, AKL, 
AMN, AOP, BDF, BHK, BMO and GIX). There are 36 triple interactions among those con¬ 
founded, and it illustrates the manifold connexions among these combinatorial problems, that 
these constitute the most symmetrical of the solutions of the problem of choosing 36 blocks of 
3 from among 16 varieties such that each pair of varieties occurs equally frequently in the same 
block (Fisher, 1940). 

There are also 36 interactions of twelve factors comprising the complementary solution; of 
interactions of four and of eleven factors 106 are confounded; of five and ten 168, of six and 
nine 280, and of seven and eight factors 435 each. 

REFERENCES 

M. M. Barnard (1936). An enumeration of the confounded arrangements in the 2x2x2... faotorial designs. 
J. Roy. Statist. Soc. Suppl. 3, 195-202. 

R. A. Fisher (1940). An examination of the different possible solutions of a problem in incomplete blocks. 
Ann. Eugen., Land., 10, 62-76. 

F. Yates (1937). The design and analysis of factorial experiments. Tech. Commun. Bur. Soil Sci., Harpenden, 
no. 36. 



40.282a 


40 

A SYSTEM OF CONFOUNDING FOR FACTORS 
WITH MORE THAN TWO ALTERNATIVES, GIV¬ 
ING COMPLETELY ORTHOGONAL CUBES AND 
HIGHER POWERS » 


See Author*8 Note^ Paper 39. 


Reprinted from AnnaU cf Bugemea^ Vol. XII, Pt. IV, pp. 288>-2Q0, 1945. 



40.283 


A SYSTEM OF CONFOUNDING FOR FACTORS WITH MORE 
THAN TWO ALTERNATIVES, GIVING COMPLETELY 
ORTHOGONAL CUBES AND HIGHER POWERS 

By R. a. fisher 
1. Introduction 

In 19421 called attention to the connexion between the theory of Abelian p^roups and the relations 
recognizable in the choice of interactions for confounding, or of treatment-combinations for use 
in the same block, when it is required to subdivide a complete replication into blocks without loss 
of information save on unimportant interactions. 

The theory was develoiied in terms of factors having only two alternatives. This is, of course, 
the case of the greatest importance in practice, though factors with three alternatives also must 
frequently be used. The basic [)osition developed so far is as follows: 

(i) If there is no partial confounding, the interactions confounded, one less in number than the 
blocks, form with the identity a subgroup. 

(ii) Any operator transforming one treatment to another treatment in the same block is ortho¬ 
gonal to the entire subgroup confounded, and these operators constitute the entire subgroup 
orthogonal to it. This is called the intrablock subgroup. It supplies the contents of the block 
containing the ‘ control ’, and, by multiplication, that containing any chosen treatment. The form 
of confounding adopted is often most concisely specified by means of the intrablock subgroup, 
or of its generators. 

On the basis of these ideas it was possible to demonstrate the remarkable proposition that: 
Using blocks of 2’’ plots, it is possible to tost all combinations of so many as (2’’ - 1) factors, in such 
a way that all interactions confounded (2*^ i-*’ — i in number) shall involve not less than three 
factors each. 

(iii) Methods of confounding using fewer factors than the maximum possible may be found 
by eliminating unwanted factors. In the intrablock subgroup the symbol of any factor to bo 
eliminated is simply deleted from any combination in which it occurs. In the subgroup con¬ 
founded the whole combination containing any such symbol is deleted. Practical applications 
have been illustrated in the third edition of the Design of Experiments. 

In my previous paper it was pointed out that, since, for example, blocks of eight would suffice 
for seven factors of two alternatives each, it follows that blocks of 27 must suffice for seven factors 
of three alternatives each. Indeed, several different arrangements suggested themselves. In the 
present note it will appear that blocks of 27 are large enough to accommodate a selection of treat¬ 
ment combinations out of 3'®, without confounding any interaction of less than throe factors. 
I have not, however, listed the various arrangements using 5-12 factors obtainable by deletion. 

2. Factors with p alternatives, where p is a prime 
To arrange the combinations of {p^ — l)/(p — 1) 

factors, each of p alternatives, p being a prime number, in blocks of p” units each, establish 



40.284 

284 


A SYSTEM OP CONFOUNDING 


arbitrarily a 1:1 correspondence between the factors, or the letters representing them, each with 
one subgroup of order p of a group of order p", and with a particular element of that subgroup. 
E.g. taking p = 3, n » 3, we may use 13 factors A . M related as follows: 

Table 1. Association of thirteen factors tviih independent elements of an Abelian group 

A a. C ap E y G ay* I fiy* A a*/3y M otfiy* 

B D afi* F ay H fiy J afiy L afi*y 

Let 8 be any element of the group and X any element to be considered in relation to 8. The 
sum of the products of the powers of a,>^, y,... in the expressions in Greek letters of 8 and X, on 
dividing by p, leaves a remainder i equal to 0,1,2,..., or (p — 1). The combination of Latin letters 
involving each X to the corresponding power i shall be chosen as the intrablock interaction 
(ov VuuatMtCt Cs-nivot JUj^) 

(i) The combination corresponding with the product of the Greek representations of 8 and 
8* is the interaction of the combinations corresponding with 8 and 8\ For the index of any letter 
X, in relation to the product 88\ will, if obtained as explained above, differ by zero or by a 
multiple of p from the sum of the indices in relation to 8 and 8 '; and this is sufficient to demonstrate 
the proposition. 

(ii) The combinations chosen by this method form a subgroup of Latin letters, isomorphic 
with the entire group formed of Greek letters. This shall be used as the iiitrablock subgroup. 

(iii) If u is the power of X in any interaction of Latin letters orthogonal to the combination 
corresponding with 8, then X{iu) is zero or a multiple of p. 

(iv) If 8^ is the element of which the Greek representation is a, then for any Latin letter A, 
the power i is the power to which a is raised in the representation of X. Hence the power of a 
in the Greek product of the interaction in which X is raised to the power of u is X[iu), and the 
necessary and sufficient condition that the selection is orthogonal to Sf, is that this Greek product 
does not involve a. 

(v) Any interaction orthogonal to all the combinations corresponding with 8^^, 8^, ... is such 
that the product of the Greek representations of its terms reduces to the identity. Such inter¬ 
actions are orthogonal to the entire intrablock subgroup and constitute the subgroup confounded. 

(vi) The subgroup confounded contains interactions of not less than three factors, for no one 
element or product of two elements belonging to different cycles can reduce to the identity. 

Applying this process to the problem of dividing 3^® different treatments in blocks of 27, so 
that the 3'® -1 interactions confounded all involve at least three factors, we construct the fol¬ 
lowing table, based on the arbitrary correspondences set out above: 


Factor 

Representative 
element of 
Greek group 

Corresponding element of intrablock subgroup 

A 

a 

A*B*C^D^E*F'WH*PJ^K*Lm^ =ACDFQJK*LM 

B 

fi 

A*B^C^D*E*F*a*HW^K^L*M* = BCD*HIJKL*M 

E 

\ y 

A*B*C*D*E*F*G*lPI*PK'Lm* = EFCFHPJKLM* 


From these three elements the intrablock subgroup may be generated without reference to the 
arbitrary correspondence established with the Greek group. The thirteen cycles are set out in full 
in §3 (Table 3). 




R. A. FISHER 


40.285 

286 


Any interaction confounded, such as ABC*, is orthogonal to these three, and therefore to the 
whole subgroup generated from them. There are i(26.24) confounded interactions of three factors; 
that is, 104 interactions in 52 cycles. 

Similarly, with blocks of 49, we may use H factors with 7 alternatives each, setting up the 
correspondence 

^ B fi C aft n aft^ E aft^ F aft* (1 aft* H aft* 

Then the element of the intrablock subgroup corresponding with a will be 

ACDEFGH, 

and that corresponding with ft will be 

from which the intrablock subgrou]), consisting of the identity and eight cycles of 6, may be 
generated. Interactions of three factors which are confounded, such as ABC*, number 
^(48.42) = 336, in 56 cycles. In all, 7* - 1 = 117,648 interactions are confounded in 19,608 cycles. 


3. The orthogonal chbe of 3 

'fhe system of confounding for 13 factors, each of three levels, in blocks of 27 is equivalent to the 
solution of the problem of an orthogonal cube of side 3 in 10 alphabets. If the first coordinate of 
the position of a point in this cube takes the values 0, 1 and 2, according to the power of A in any 
element of the intrablook subgroup, the second co-ordinate being likewise determined by the 
power of B, and the third by that of E, the power of any one of the remaining 10 letters specifies 
the values, 0, 1 or 2, of 10 distinct entries corresponding to each point of the cube. These are set 
out in three squares, representing successive layers, as follows: 

Table 2 



Corresponding with rows and columns of a Latin or Graeco-Latin square, the cube consists of 
plane strata of 9 points each, which may be referred to as rows, columns and layers respectively, 
for the sake of exi>res8ing its properties in language analogous to that used for the squares. In 
every one of the 10 sites, each number occurs thrice in every row, column or layer, and thrice with 
each possible number at each of the remaining 9 sites. 

In other words we may say that each of 13 principles of classification is made to divide 27 







40.286 


286 A SYSTEM OP CONFOUNDING 

objects into 3 sets of 9, in each of which all the classes of every other principle of classification are 
equally represented. 

It should be noted that sets of three objects alike in any two respects are alike also in two other 
characters, but differ in the remaining nine. 

Since the orthogonal cube of 3 affords 13 ways of dividing 27 objects into 3 groups of 9 each in 
such a way that every two objects fall 4 times into the same class, it provides at once a solution of 
the problem in incomplete blocks of arrangii^ 27 varieties in blocks of 9, using 13 replications 
and 39 blocks. 

Table 3. Intrahlock subgroup for blocks of 27; the identity with 13 cycUs of 2 
ACDFGJK*LM 
BODmiJKLm 
ABC*FQHIJ*M* 

ABW*FGHU*KL* 

EFG*H1*JKLM» 

ACDEF*H1*J*L* 

ACDEHPH*IKM* 

BCD*EFG*H*J*K* 

BCD^h^F*Gl*LM* 

ABC*EF*H*KLM 

A*BDEGmJL*M* 

AB*D*EF^1JKW* 

AB(PEHPI*JK*L* 


Table 4. 40 treatments in 40 blocks of 13 


0 

b 

e 

h 

i 

0 

o' 

r 

r' 

u 

u' 

V v' 

C 

c 

e 

3 

m 

P 

p' 

r 

f' 

to 

to' 

z z' 

b 

e 

h 

i 

n 

P 

9 

a 

t 

V) 

x' 

y z 

c 

e 

3 

m 

n 

o' 

9' 

a 

t 

u' 

v' 

X y' 

b 

e 

h 

i 

n' 

P' 

9' 

«' 

t' 

fo' 

X 

y' z' 

c 

e 

3 

m 

n' 

0 

9 

a' 

t’ 

u 

V 

x' y 

C 

a 

e 

f 

9 

n 

n' 

r 

r' 

a 

a' 

t t' 

C 

e 

f 

♦ 

k 

P 

P' 

a 

a' 

V 

v' 

X x' 

a 

e 

f 

9 

0 

P 

9' 

u 

V 

w 

X 

y' * 

c 

s 

i 

k 

n' 

0 

9 

r 

t 

u' 

to 

y' z' 

a 

e 

f 

9 

o' 

P' 

9 

u' 

V* 

io' 

x' 

y z' 

c 

f 

i 

k 

n 

o' 

9' 

r' 

t' 

u 

to' 

y * 

0 

a 

b 

c 

d 

n 

n' 

o 

o' 

P 

P' 

9 9' 

0 

c 

9 

h 

1 

p 

P' 

t 

t' 

u 

u' 

y y' 

a 

b 

c 

d 

r 

9 

t' 

u 

v' 

to 

X 

y z' 

e 

9 

h 

1 

n 

o' 

9' 

r 

a' 

V 

to 

X' z' 

a 

b 

c 

d 

r' 

a' 

t 

u' 

V 

to' 

x' 

y' z 

0 

9 

h 

1 

n' 

0 

9 

r' 

a 

v' 

to' 

X z 

C 

a 

h 

3 

k 

n 

n' 

u 

u* 

w 

to' 

X x' 

a 

h 

c 

d 

c 

/ 

9 

h 

% 

3 

k 

1 m 

a 

h 

3 

k 

o 

P 

9' 

r' 

«' 

t 

v' 

y z' 

C 

d 

e 

k 

1 

9 

9' 

r 

r' 

X 

x' 

y y' 

a 

h 

3 

k 

o' 

P' 

9 

r 

a 

i' 

V 

y' z 

d 

e 

k 

1 

n 

0 

P' 

a 

t 

u 

V 

to' z' 

C 

a 

i 

1 

m 

n 

n' 

V 

v' 

y 

y' 

z z' 

d 

e 

k 

1 

n' 

o' 

P 

a' 

t' 

u' 

v' 

to z 

a 

i 

1 

m 

0 

P 

9' 

r 

a 

t' 

u' 

to' x' 

C 

d 

S 

h 

m 

9 

9' 

a 

a' 

u 

u' 

z z' 

a 

i 

1 

m 

o' 

P' 

9 

r' 


t 

u 

to X 

d 

f 

h 

m 

n 

o 

P' 

r' 

t' 

v' 

to 

X' y' 

G 

b 

/ 

3 

1 

0 

o' 

a 


to 

to' 

y y' 

d 

f 

h 

m 

n' 

o' 

P 

r 

t 

V 

to' 

X y 

b 

f 

3 

1 

n 

P 

9 

r' 


u' 

V 

X z' 

C 

d 

9 

i 

3 

9 

9’ 

t 

t' 

V 

v' 

to to' 

b 

f 

3 

1 

n' 

P' 

9' 

T 

t 

u 

v' 

x' z 

d 

9 

i 

3 

n 

0 

P' 

r 

a' 

u' 

X 

y * 

0 

b 

9 

k 

m 

0 

o' 

t 


X 

x' 

z z' 

d 

9 

i 

3 

n' 

o' 

P 

r' 

a 

u 

X' 

y' z' 

b 

9 

k 

m 

n 

p 

9 

r 


u 

v' 

to' y' 













b 

9 

k 

m 

n' 

p' 

9' 

r' 

a 

u' 

V 

to y 














The sets of 3 also, into which the 27 objects are divided by a double classification, supply a 
solution of the problem of testing 27 varieties in 13 replications in 117 blocks of 3 each, since sets 
of 4 principles of classification, any pair of which make the same subdivision into 9 threes, can be 
chosen in 13 ways. These sets of 4 correspond 1 to 1 with the 13 primary principles of classification, 
the relation being that between any element of the Abelian group to which the factors correspond 
and the cycles of the subgroup orthogonal to that element; thus, to the single principle of classi¬ 
fication A there corresponds the set of four cycles, BEHI, with B there corresponds AEFO^ 
and so on. 



40.287 

R. A. FISHER 287 

In addition to these two solutions, there are two more randomized block solutions to be derived 
from the orthogonal cube. If we designate the elements of the intrablock subgroup set out above 


Table 6. 40 treatments in 130 blocks of 4 


0 

a 

n 

n' 

G 

b 

0 

o' 

C 

c 

p p' 

C 

d 

9 

9' 

c 

e 

r 

r' 


C 

f 

a 

a' 

C 

9 

t 

t' 

a 

0 

P 

9' 

b 

n 

p 

9 

c 

n 

o' q' 

d 

n 

0 

P' 

c 

n 

a 

t 


S 

o 

V) 

y' 

9 

0 

X 

z 

a 

o' 

P' 

9 

b 

n' 

p' 

9' 

c 

n' 

0 q 

d 

n' 

0 

P 

e 

n' 

a' 

t' 


/ 

o' 

w 

y 

9 

o' 

x' 

z' 

a 

r 

s 

i' 

b 

r 

u 

v' 

c 

r 

w z' 

d 

r 

X 

y 

r 

o 

u 

V 


/ 

n 

r' 

t' 

9 

n 

r 

a' 

a 

r' 

s' 

t 

b 

r' 

u' 

V 

c 

r' 

w' z 

d 

r' 

X 

y' 

e 

o' 

u 

V 


/ 

n' 

r 

t 

9 

n' 

r' 

a 

a 

u 

w 

X 

b 

a 

w 

y 

c 

a 

v' X 

d 

a 

u 

z' 

c 

P 

w 

z 


/ 

P 

V 

X 

9 

P 

u 

y' 

a 

u' 

w' 

x' 

b 

a' 

vo' 

y' 

c 

a' 

V x' 

d 

a' 

u 

z 

e 

P' 

w 

z' 


/ 

P' 

v' 

x' 

9 

P' 

u' 

y 

a 

v' 

y 

z' 

b 

t 

x' 

z 

c 

t 

u' y' 

d 

t 

V 

w' 

p. 

9 

x' 

y 


/ 

9' 

u 

z 

9 

9' 

V 

IV 

a 

V 

y' 

z 

b 

t' 

X 

z' 

c 

t' 

u y 

d 

t' 


w 

e 

9' 

X 

y 


/ 

9 

u' 

z' 

9 

9 

v' 

w' 

b 

e 

h 

t 

a 

e 

f 

9 

d 

e 

k 1 

c 

e 

3 

m 

a 

h 

c 

d 


6 

9 

k 

m 

b 

f 

3 

1 



a 

i 

1 

m 

a 

h 

j 

k 

d g 

i 

3 


c / 

i 

k 


c 

9 

h 

1 


d 

f h 

m 





C 

h 

u 

u' 

C 

' i 

« 

v' 


w 

v>' 


C k 

r. 

x' 


G 

1 

y 

y' 


C 

m z 

z' 





h 

n 

w 


i 

n 

y 

z 

j o' 

a 

y' 


k o 

t 

z' 


1 

o 

a 

w' 


m 

o' t 

X 





h 

n' 

w' 

X 

i 

n' 

y' 

z' 

j 0 

a' 

y 


k o' 

t' 

z 


1 

o' 

a' 

w 


m 

o t' 

x' 





h 

0 

r' 

v' 

i 

0 

r 

u' 

3 P 

r' 

z' 


k q 

r 

y' 


1 

9 

r' 

X 


m 

P r 

to' 





h 

o' 

r 

V 

i 

o' 

r' 

u 

3 P' 

r 

z 


k q' 

r' 

y 


1 

9' 

r 

x' 


m 

p' r' 

w 





h 

P 

t 

y 

i 

P 

a 

x' 

j n 

u' 

X 


k n 

u 

u>' 


1 

n 

V 

z' 


m 

n v' 

y' 





h 

P' 

i' 

y' 

t 

P' 

a' 

X 

j n' 

u 

x' 


k n' 

u' 

w 


1 

n' 

v' 

z 


m 

n' V 

y 





h 

9 

a 

z 

i 

9 

t 

w 

3 9' 

e 

v' 


k p' 

a 

V 


1 

P' 

t 

u 


m 

q' a 

u’ 





h 

9' 

s' 

z' 

% 

9' 

t' 

w' 

3 9 

t' 

V 


k V 

a' 

v' 


1 

P 

t' 

u' 


tn 

9 

u 




Table 6. Incomplete blocks of 11-13 replications, existing or possibly existing 


r 

1 

b 

k 

A 

II 

23 

23 

11 

5 

11 

45 

99 

5 

1 

II 

45 

55 

9 

2 

11 

56 

56 

II 

I 

II 

100 

no 

10 

1 0.8.1 J 

II 

111 

III 

II 

I 0.8./ 

12 

19 

57 

4 

2 

12 

22 

33 

8 

4 

12 

25 

100 

3 

1 

12 

33 

44 

9 

3 

12 

37 

III 

4 

I 

12 

61 

122 

6 

I 

12 

67 

67 

12 

2 

12 

121 

132 

II 

I O.R. 

12 

133 

133 

12 

I 0.8. 

13 

27 

II7 

3 

I O.C. 

13 

27 

39 

9 

4 O.C. 

13 

27 

27 

13 

6 

>3 

40 

130 

4 

I O.C. 

*3 

40 

52 

10 

3 

13 

40 

40 

13 

4 O.C. 

13 

S 3 

S 3 

13 

3 

13 

66 

143 

6 

I 

13 

66 

78 i 

II 

2 

13 

79 

79 1 

13 

2 

13 

144 

156 i 

12 

I 0.8. \ J 

13 

157 

157 1 

13 

I 0.8./ 


(Table 3) by the letters n to z, and their squares by the letters n' to z\ the identity being represented 
by C, while the 13 first letters of the alphabet stand for the 13 principles of classification in accord- 







40.288 


288 A SYSTEM OF CONFOUNDING 

anoe with which these 27 have been subdivided, we may associate each group of 9 with the four 
letters of the subgroup orthogonal to the principle of classification from which it was derived, thus 
making 39 blocks of 13 letters each, which with a fortieth block consisting of the first 13 letters of 
the alphabet themselves, involves each of the letters 13 times and every pair 4 times in association, 
thus providing 40 treatments in 40 blocks of 13 with 13 replications. 

Equally, each set of 3 obtained by a double subdivision may be associated with the symbol for 
the principle of classification orthogonal to the two already used, to make 117 blocks of 4 with 
which are associated 13 more blocks of 4, representing the cycles of the subgroups orthogonal to 
each classification used. In this case, 40 treatments are subdivided in 130 blocks of 4 in 13 replica¬ 
tions such that every 2 treatments occur once in the same block. Table 6 shows these 4 solutions 
derivable from the orthogonal cube of 3 in a list of the possible incomplete block solutions with 
11-13 replications. The two pairs of solutions related to orthogonal squares of sides 10 to 12 have 
been included as fulfilling the arithmetical condition, although presumably no combinatorial 
solutions exist for them. 


4 . Factors having a number of levels which is a power of a prime 

In § 2 it was shown that, using factors having any prime number p of different levels, blocks of 
p* units will suffice for use with 

(p--l)/(p_l) 

different factors, without any interaction of less than three factors being confounded. 

This proposition may be extended with full generality to factors having a number of variants, 
j/, which is a power of a prime. Using the fact that a field of p'* symbols can be constructed un¬ 
ambiguously subject to arithmetical operations, we shall show that blocks of p^* plots suffice 
for use with 

(p«-l)/(pr_i) 

factors, each at p** levels. 

Let A^, A|.A« be a field variables each taking p' values, then out of p^* combinations one and 

only one has all values zero. Let 

io,A, 

be any linear function of the variables A, such that not all the coefficients a are zero. Then the 
number of sets of coefficients which may possibly be chosen is p*'* — 1. 


Since, however, 


a So^A, =^Sa,A,, 


where it appears that each possible linear function is related by simple multiplication 

with p^— 1 others, of which one is zero; or, in other words, belongs to a set of p' — 1 non-zero 
functions. 

There are, therefore, (p”- 1) different sets, and these are associated as indices with an 

equal number of letters, or, otherwise stated, each is used to specify the level of application of an 
equal number of experimental factors, in the p” -1 different treatment combinations occurring 
in the same block with the control. 



40.289 

R. A. FISHER 289 

To show that such an intrablock subgroup will confound no interaction of less than three factors; 
consider that if 

... 

specify any interaction, this interaction will be confounded if, and only if 

= 0 , 

where S stands for summation over all letters, for all combinations of A. Tliis will be true when, 
and only when, 

= b, for all j from 1 to a. 

If i could be zero for all letters save two, take them to be A and B; then for those two letters 
hj — -mj, for all j, 

or, 6 and a would belong to the same set, contrary to the construction. Hence no interaction of 
less than three factors has been confounded. 


6. Solution using 64 piiOTs to a blouk, ani> 21 factors at 4 levels each 

The abstract ])rocess of §4 may be illustrated by the problem of the cube of 4 in 18 alphabets; 
or, in other words, the subdivisioji of the 63 comparisons among 64 objects into 21 orthogonal sets 
of 3, each being com})arisons among 4 lots of 10 into which the whole may be divided. 

The rules for addition and multiplication of the four field symbols, which will be written 
0, l,p, q, are shown below. 


Addition table 
I o I /> q 

o j o I /> 7 

l . 1 O q p 

V \ p q o J 

q q p I O 


Multiplication table 
o i p q 

o o o o o 

I O 1 P q 

p O P q I 

q O q 1 p 


The 64 combinations of three field values consist of one in which they are all zero, and 21 sets 
of 3 such that one member of each set is a simple multiple of the others. These sets are: 


21 sets of coefficients 


A 

I 

o 

o 

H 

I o 

P 

0 

I 

I 

9 

B 

o 

I 

o 

I 

I o 

9 

P 

I 

P 

I 

G 

o 

o 

I 

J 

I I 

o 

Q 

I 

P 

P 

D 

o 

I 

I 

K ] 

I p 

o 

R 

I 

P 

9 

E 

o 

I 

p 

L 1 

I <7 

o 

S 

I 

9 

I 

F 

o 

I 

9 

M 1 

I I 

I 

T 

I 

9 

P 

0 

1 

o 

1 

N ] 

[ I 

p 

U 

X 

9 

9 


We now take any one of these and find the sum of the products of the coefficients with each of 
the 21 in turn, to find the index of the corresponding letter in the corresponding treatment in the 
block containing the ‘control’. Instead of a^, a«, I shall now write a, a', a". We find thus the 



40.290 

290 A SYSTEM OP CONFOUNDING 


21 treatment combinations, each representing a set of 3 which, with the control, occupy a single 
block: 


{aghijklmnopqrttu) 

{bdefjkTnmop'q'rWu'') 

{cdey''gh'i“mn'o'’pq'r^9t'u'') 

ibce''f'gh'i‘'jkTn''o*p''ra't) 

{bc'd*e'g'h''ijk'Tm''n'qr* 8 u') 

{bcrdy'^0*hi'jkl''m'o‘'p^t'u) 

(acdeTf>'“i'Jldn‘'oYrru') 

(ae'd'6''fg^h'jklmrnYq'ar) 

iao'*d'’ef*g*i"jklm'oY^‘'«*^‘') 

{abdefghikrpYr"«'t'u') 


{ab'd'6y'ghij''k'inWoY7'^') 

(ab^d^eTgMjTm'nWtru'') 

\abce’f*h"%YVmn'o''p'q/^u) 

\ahc'd''eYk'k''Vtn'n"<>p/iu'') 

(abe''d'f''g'i*k‘'l'in*no'q'T»tl') 

{ab'odyh'i'j 'fc'm'np'wi V) 

(ab'e'ef''^h'j‘'k'mo'qr^8'f'u) 

{ab'c*deYiYfB'n'opq’i'tu') 

{ab*cd'eh''i'j Tm/'op^r^s't) 

{ab''e'dfYh'j Tno“pY^^>^') 
{ab''c''eyg'iy 'Vmnl'p’qr^tY 


These may alternatively be generated from the first three; thus the simple interaction of the 
first two gives the tenth as written above; the other interactions of these two give the eleventh 
and twelfth. 

The completely orthogonal cube of 4 may then be constructed directly by taking the four 
phases of any factor, such as A, to specify position in one direction, B for a second and G for the 
third, and indicating the phase of the 18 following letters by level designations such as 0, 1, 2, 3 
in the 18 cells assigned to each of the 64 points of the cube. Such a cube could, of course, bo used 
to generate four distinct incomplete block solutioiu, these all having 21 replications. 


SUMMABY 

The system of confounding a number of factors each of only two alternatives, developed in the 
previous paf^er, is here extended (i) to factors having any prime number of alternatives, and 
(ii) to the case in which the number of alternatives is any power of a prime. 

In the first case the factors may be chosen to correspond with subgroups of order p of an Abelian 
group of order p", equal to the number of plots in each block. In the second, each factor corre¬ 
sponds with a combination of a values, not all zero, each taking the p' values of the field, this being 
the number of levels for each factor. Any selection which is a simple multiple of a second belongs 
to the same faetor; thus (p^* — l)/(p^ — 1) different factors may be used, without confounding any 
interaction of less than three factors. 


REFERENCES 

R. A. Fishbr (1942). The theory of confounding in factorial experiments in relation to the theory of groups. 
Ann. Evqtn.t Lond., 11, 341-63. 



41.394s 


41 

SOME COMBINATORIAL THEOREMS AND 
ENUMERATIONS CONNECTED WITH THE 
NUMBERS OF DIAGONAL TYPES OF A 
LATIN SQUARE 


AUTHOR’S NOTE 

The series of theorems in this paper forms a little essay in the art of 
using the available mathematical apparatus (generating functions, 
bipartitional functions, etc.) to solve a number of problems of enu¬ 
meration of some complexity. The author is convinced that more 
comprehensive methods remain to be developed, but that this will 
only be accomplished, and the accomplishment understood, by the 
aid of a study of particular problems, such as those here discussed. 

The proofs in Sections 4 and 7 have been made more explicit than 
in the original, and some arithmetical slips in Table 3 have been 
corrected. 

The problem in Section 6 suggests a very general class of enumera¬ 
tion, which so far as I know has not been discussed, namely: If a de¬ 
sign consists of n objects at n loci, given a permutation group for 
loci, such that designs transformable into one another by any ele¬ 
ment of the group are regarded as equivalent, to enumerate the de¬ 
signs which can be made of objects corresponding with any given 
partition of n. 

A great many practical problems are of this general type. 


Reprinted from An?ial« 0 / Eugenic*, Vol. XI, Pt. IV, pp. 395-401, 1942. 



41.396 


SOME COMBINATORIAL THEOREMS AND ENUMERATIONS 
CONNECTED WITH THE NUMBERS OF DIAGONAL TYPES 
OF A LATIN SQUARE 

By B. a. FISHER 

1. If objects are of a different kinds, each available in unlimited numbers, the number of ways 

of choosing a set of n objects is +a — 1)! 

(a-l)!n! * 

For, if the selection consists of of the first kind, of the second,..., of the last, to every 
different selection there corresponds a linear arrangement of symbols, e.g. n noughts and a — 1 
crosses, arranged so that the first cross follows noughts, the second follows n, more, while the 
sequence ends with a series of n^. Consequently, the number of ways of arranging such a sequence 
is the number of ways of making the selection required. 

The number enumerated is clearly the coefficient of x* in the expansion of the generating function 

(!-«)-». 

2. If I* “ (l>r;>*•-)» i^(w)==/D, 

is a partition of n, the number of ways of choosing n objects so that ir^ types are represented each 
Pi times, TTi types p^ times, etc., is ^ I 

... {a—p)V 

for this is the number of ways of assigning the a kinds of objects to the groups represented a given 
number of times. 

It may be noted that by multinomial expansion 
where the summation is taken over all integral values of . Hence 

V 

is the coefficient of in the expansion of 

( 1 -*)-; 

this supplies an alternative demonstration of the expression 

(n+fl-l)l 
(a-l)!nl ’ 

previously obtained. 

8. If a, is the number of types of weight s, then the number of ways of making a selection of 
total weight n is the coefficient of x" in the expansion of 

If we write ^(x) * axX+a|X*4*a|X*+..., 

this generating function may be written alternatively in the form 
exp{^(x)+i^x»)+i^(x») + ...). 



41.30Sa 


4. If we imagiiie « wiate which takes the vahie x with frequency au the value 
2 * with frequency ai> and so on, then 

4(x). ♦(x*)..-- 

wiU be the sums of the first, second, and higher powers of the variate^ i.e., 


sii etc. 

The product of the variate values of any selection will be z to the power of the 
weight of the selection. Hence the monomial s 3 rmmetrio function of the vari¬ 
ates, G{P), corresponding with any partition P is the generating function the 
coefficients of the powers of x in which give the frequencies with which different 
wei^ts occur in selections of type P. But, the bipartitional function Gs(P, Q) 
is defined to give the coefficients of the expansion of G(P) in sums of powers. 
That is, 

a(P) 

the summation extending over all partitions Q. Hence 

is the generating function for weight of selections of type P, where Q is any par¬ 
tition of the number of variates choeen, defined by 


Combinatorily, as Sukhatme (1938) has shown, 

is the number of ways of making directed circuits q out of undivided parts p. 
Since, summing for all partitions (P) of the same partible number 


"^adiP.Q) 


_ 1 _ 


it follows that the generating function for a given number of objects selected is 


? 




where Q is any partition of the number of objects. 

Thus the generating function for a choice of one object is 4(x). For two ob¬ 
jects we have 

!♦’(*) + 

for three objects 


and SO on. 



41.396 

396 


COMBINATORIAL THEOREMS AND ENUMERATIONS 


the 

rP, and given 


total weight, muXJ)e 
where Q stands for till 


queno^s, and so on, then 

will be tn^^ms of the first, second, and higher powers of all the variates. Con 
generating fims^ion for the number of ways of choosing objects of partitional^ 

HOsiP, Q) , 

.rtition of which the partible number is thejpp^ht, 

and Oa is the bipartitional fuIi^ion giving the coefficient of tjre^expansion of the monomial sym¬ 
metric function P in terms of sum^f powers (Maomahprfri915). Combinatorily, as Sukhatme 
(1938) has shown, Q) 

is the number of ways of making directed^ircuiti^out of undivided parts p. 

Since 


it follows that, if we sum for all partitufm P of a givei^^ible number, the generating function is 

where the summation is tak^over all partitions Q of the same pqrtible number. 

Thus the generating futmtion for a choice of qne object is 0(a;). lol^wo objects we have 

for three objects / -j- -f 

and so on. 

6. Branches. Let a, stand for the number of ways of arranging a lines in a connected figure, so 
that after the first the others may follow in sequence, or branch off in any number from the end 
of any previous line. Thus, with four lines we should have the arrangements 


/V\ 


Then, in making up branches of (« +1) lines we may follow the first by a selection of any number 
of branches containing a total number a\ consequently, the sequence of numbers a, may be found 
by equating the coefficients of in the identity 

«-l s-l 

or, symbolically, log ® + |0(a:») +.... 

In arithmetical calculation, when each coefficient up to a, has been obtained, the generating 
function, so far as it has been calculated, is multiplied by 

(!-«•)-«•, 

to obtain a new series, including the next coefficient, the earlier coefficients being unaltered. 
Table 1 shows the calculation up to branches of 16 lines, set out in a form from which it may 
easily bo extended to higher values. 



41.397 

897 


R. A. FISHER 

A similar problem arises in the chemistry of carbon compounds, in which, however, the modi¬ 
fication is introduced that the number of branches arising at any point is limited to three. The 
recurrence relationship by which the number of branches may be enumerated, subject to this 
limitation, is evidently 

- ^(a;) = 1 +<^{x) + + J0®(a;) + J0(ar)^(x*) + 

Table 2 shows the calculation of the first 16 terms of this series. 

Table 1. Calctdation of the number of different branches to be made of n elements 

Final values 


^{x)/x 



I 










X 


I 










X 

X* 

I 

2 









2 


I 

2 

4 








4 

a* 

I 

3 

S 9 








9 


I 

3 

7 II 

20 







20 

a? 

I 

4 

II 19 

28 

48 






48 

x' 

I 

4 

13 29 

47 

67 


IIS 




IIS 

a* 

I 

5 

17 47 

83 

123 


171 

286 



286 

»• 

I 

5 

23 61 

142 

222 


318 

433 

719 


719 

*10 

I 

6 

27 91 

23s 

415 


607 

837 

1123 

1842 

1842 


I 

6 

33 125 

341 

741 


1173 

1633 

2205 

2924 

4766 

*u 

I 

7 

42 180 

531 

1301 


2261 

3296 

4440 

5878 

12486 

x*» 

I 

7 

48 230 

833 

1983 


4287 

6587 

9161 

12037 

32973 


1 

8 

57 31S 

1269 

3349 


7741 

13261 

18981 

25452 

87811 

x« 

1 

8 

69 411 

1890 

5570 


12650 

2587s 

39603 

53983 

235381 

Table 2. 

Number of branches such that no more than four elements meet at any point. 





Enumeration 

of alkyl tadicals 




Power of 

X 

I 

<l> 








4>lx 

o 

1 

2 


I 

I 

I 

-1 







X 

X 

2 

3 



2 

I 





*4 

4 

4 

4 



4 

2i 




I 

4 


8 

5 



8 

6 




i4 

i4 


17 

6 



17 

14 


I 


4i 

24 

4 

39 

7 



39 

33 




II 

6 


89 

8 



89 

80 


2 


284 

ii4 


211 

9 



211 

194 




73i 

274 

f 

507 

lO 



507 

478 


4 


190 

59 


1238 

II 



1238 

1188 




490 

141 


3057 

12 



3057 

2979I 


8* 


1265I 

327 

u 

7639 

*3 



7639 

7528 




3278 

796 


I924I 

14 



19241 

19161I 


I9t 


85134 

19294 


48865 

IS 



4886s 

49060 




221824 

4796 

24 

124906 


6. The number of voays of arranging the units of a partition, P, in a ring. 

The number of ways of arranging the units of a partition, P, of the partible number t in open 
sequence is 1 1 














41.308 

398 


COMBINATORIAL THEOREMS AND ENUMERATIONS 


From this it follows that, provided the numbers p have no common factor, the number of ways of 
arranging them in a closed sequence is 

1 <1 

since each closed sequence corresponds with i open sequences, in this case all different, formed by 
breaking the ring at i different points. 

If, however, the numbers p have any common factor/, the number of ways of forming an open 
sequence consisting of a repetition of / equivalent parts is 





and if p// have further prime common factors, F^, ..., the number of these which will consist 
of a succession of/f\ equivalent parts is and of these again the number consisting of fF^F^ 
equivalent parts is 

Consequently, in the enumeration of the open sequences, consisting of successions of only 
/ equivalent parts, ^ 


will appear with coefficient (- )•; and in the enumeration of closed sequences with only/repetitions, 
it will appear with coefficient . j 




V 


Hence, in the enumeration of all closed sequences, Nf will be involved with coefficient 



where/j./s,... are the prime factors of/. The total number of closed sequences is, therefore, 

f ^ 

where the summation is for all common factors of the parts, including unity, and 

For example, with the partition (24,12‘), the values of Nf for all factors of 12 are: 


Tables. Fuine^Hhi iJiiNiiiiiish’iii 




The two last lines both give the total 269631,401703,980044 as the value of n(P), when 
P a (24,12'); the fourth line gives the terms of the formula, which need not individually be 
integral; the fifth line shows the arrangements divided according to the highest symmetry each 
shows, i.e. 

and >0 on. 



41.308ft 


Table 3. Form for the mumeration of ringt compootd of elmente of a given paHUian 


/ 

1 

2 

3 

4 

6 

12 

Nf 

12.467M7,283610,994800 

2498,640144 

900900 

18480 

420 

12 

•(/) 

1 

1 

2 

2 

2 

4 

JV/t(/)/48 

269631.401741,896725 

62.066008 

376S7)i 

770 

17« 

1 

With /-fold aymmetry 

269631.401689331983 

104,109219 

66280 

1639 

61 

3 



R. A. FISHER 


7. Rings of branches. The number of distinct directed rings of which the elements are branches, 
chosen so that the total weight is given, may be found by multiplying the number of arrangements 
for a given partition P, by the generating function for the number of ways of choosing elements 
of that partition. This gives the generating function, 





Hence, if ^(a:) = <l>{x) + + •.., 

or, symbolically, - log (1 — 

the generating function required is 

/-I / 


of which the coefficient of will enumerate the number of rings of branches of total weight w. 
It should be noted that this enumeration includes ‘ rings ’ of a single element, i.e. of simple branches. 
The numbers of proper rings of two or more branches may be obtained by subtraction. The 
coefficients of ^{x) and the enumeration of the numbers of ‘rings and branches’ are shown in 
Table 4. 

Table 4. Enumeraiion of rings consisting of branches 


Power 


of X 





X 

a 

I 

I* 

* 



3 

3i 


f 


4 

7l 

1 



5 

i9l 




6 

48 

If 

I 


7 

124^ 




8 

323! 

3i 


* 

9 

8s9i 


2I 


lO 

2299t 

9} 



II 

62i6i^r 




12 

16917^ 

*4 

5i 

if 

*3 

46349A 




14 

127650* 

62! 



IS 

353256 


laf 


i6 

98 is 8 sH 

i6i« 

. 

3f 







Rings and 
branohes 

Proper 

rings 

0 






2 

X 






4 

2 






9 

5 

f 





20 

II 


i 

> 



51 

31 





125 

77 




f 


329 

214 

if 




f 

862 

576 




I 

2311 

1592 






6217 

4375 


i 



* 

16949 

12183 





H 

46350 

33864 



if 


1 

127714 

94741 

if 




A 

353272 

265461 

. 



f 

f 

981753 

746372 









41.399a 


but 

^Nf(P)G»iP,Q) 

is the coefficient of 

... . , ■ ■ 
in the expansion of 

(o' -{-yf + ...yif, 

and this is clearly zero, save when ^ x - t/f, when it is unity. 
Hence, 

EE- W/(P)C.(P, 0)*}i*{| ■ • ■ - S ^ 2 ; ♦'"(!') 



41.400 

400 COMBINATORIAL THEOREMS AND ENUMERATIONS 

It will be seen, for example, that of weight 6 there are nine possible branches, and twenty 
possible rings and branches. There must therefore be eleven possible proper rings. These are: 

o>^ <C5<“ 

<1 <n <> O 


The configurations may also be denoted by giving letters to the termini of the constituent 
lines in such order as to show which follow which in order round the ring, or from the branches to 
their base. Thus the eleven configurations figured above may be denoted as foUows: 


D. 

E. 

^ F. 

B.. 

(DE. 

..c{f. 

[b.. 


•-{r- ...| 

'"(ir 


(D. 

(E. 

..CDIF. 

[b.. 






B.. 

EF. 


Here any termination is shown by a single stop, recurrence by two dots. For clarity B has been 
repeated in two cases, but with experience in the notation this is unnecessary. The same con¬ 
figuration may evidently have more than one equivalent formula, but the formula determines the 
configuration uniquely. 

8. Diagonal types of a Latin square. In a Latin square written in the standard form, with comer 
at A, the letters forming the diagonal possess a definite set of relationships, or configuration, 
which is unchanged by any permutation of the rows, columns and letters which leaves the comer 
element unchanged, and which does not permute the categories. Such a transformation is called 
an intramutation. 

Taking B to represent any letter used in the square, other than A , we may note that the row and 
column containing B in the first column and row intersect on the diagonal at an element which 
has some letter other than B. Consequently, the type of any diagonal of a Latin square may be 
denoted by a sequence of letters in which B follows whatever letter is thus found on the diagonal. 
Any A on the diagonal will form the commencement of a terminating branch, but the con¬ 
figuration may consist wholly or partly of rings of branches. The total weight of the configuration 
found in any diagonal of an n x n square is evidently n — 1, and the number of different diagonal 
types on which a Latin square can possibly be built is the number of ways of selecting elements of 
total weight n — 1 from those enumerated in Table 4. 

Thus, if b, stands for the number of different rings or branches of weight a, the coefficient of 
in the expansion of 

(1 -*)-*» (1 - a :*)-*. (1 ... 

will be the number of possible diagonal types of an n x n Latin square. 



41.401 

R. A. FISHER 401 

The enumeration of the number of diagonal types available for squares of different sizes thus 
proceeds as in Table 1, using the series obtained in Table 4. The generating function for this 
series is shown in Table 5. 


Table 6. Calculation of the numbers of diagonal types 

Diagonal Side of 
types square 


o 

I 

z 

1 

I 

I 

3 







I 

I 

3 

1 

2 

3 

3 

I 

3 

7 






7 

4 

4 

I 

6 

10 

*9 





*9 

5 

5 

1 

6 

18 

27 

47 




47 

6 

6 

I 

lO 

32 

59 

79 

130 



*30 

7 

7 

I 

lO 

44 

107 

167 

218 

343 


343 

8 

8 

I 


69 

204 

344 

497 

622 

95* 

95* 

9 

9 

1 

*5 

los 

312 

692 

1049 

1424 

*753 

2615 

10 

lO 

I 

21 

141 

564 

1314 

2283 

3*58 

4*45 

73*8 

11 

II 

I 

21 

201 

912 

2302 

4699 

7074 

9377 

20491 

12 

12 

I 

28 

283 

15*9 

4289 

9644 

*55*9 

21770 

57903 

*3 

13 

I 

28 

367 

2287 

7837 

17680 

33930 

49393 

163898 

*4 

J4 

I 

36 

495 

3699 

13929 

3545* 

70576 

**3346 

466199 

*5 

*5 

I 

36 

659 

5603 

24093 

68667 

138667 

25*5*4 

1,328993 

16 

i6 

I 

45 

833 

8630 

40800 

133008 

287758 

546681 

3,799624 

*7 


Summary 

The note contains a sequence of theorems in combinatorial analysis connected with the numbers 
of selections which can be made from objects of given numbers of types. 

Enumerations are given of 

(o) the number of different branches to be formed of n elements; 

(6) the related ])roblem of the enumeration of alkyl radicals; 

(c) the number of rings consisting of branches; and 

(rf) the numbers of diagonal types of Latin squares up to seventeen elements in the side. 
REFERENCES 


P. A. Macmahon ( 1916 ). Chapter I. Camb. Univ. Press. 

P. V. SuKHATME ( 1038 ). On bipartitional functions. Phil. Trans. A, 287 , 376 - 409 . 







42.305a 


42 

THE LIKELIHOOD SOLUTION OF A PROBLEM 
IN COMPOUNDED PROBABILITIES* 


AUTHOR’S NOTE 

Examples in which an intricate problem ^'comes out neatly" are not 
now so pKipular as they were with mathematical teachers. Apart, 
however, from their undoubted educational value to students wish¬ 
ing to familiarise themselves with the available resources of manipu¬ 
lative technique, such examples do, by their definiteness and exacti¬ 
tude, bring general problems to a sharper focus, and supply an ele¬ 
ment very necessary to their comprehensive discussion. 


Reprinted from AnnaU of Eugtnie»t Vol. XI, Pt. Ill, pp. 306-907, 1043. 



42.306 


THE LTKELTHOOD SOLUTION OF A PROBLEM IN 
COMPOUNDED PROBABILITIES 

By R. a. FISHER 


1 . 

If for an event with probability p, we observe a cases out of A, for a second event with 
probability p’, b cases out of B, and for a third event with probability pp\ c cases out of C, 
the likelihood of the combined observations has for its logarithm 
a\ogp + {A —a) log(l —p) + b logp' + {i5-/j) log(l -p') + c \ogpp' + {C — c) \o^{l—pp'). 
Differentiating with respect to p, we have 

a + c A—a {C — c)p’ 

p l~p 1—pp' ’ 

so that the equation of estimation found by maximizing the likelihood for variation ofp is 
(a + c)(l-p)(l-pp')-(A-«)p(I-pp')-(C’-c)pp'(l-p) = 0. 

{a + c) — (a + C) pp' 

' (/I +c)-(>l + C)pp'' 

Of :iL _ 

(a + c) - (a + 6’)pp' (A-a)( l-pp') (^4-a) (c-Cpp')’ 

so that, using the similar equation for p', it follows that 
Ap — a _ Bp’ — h 

1—p 1 —p' 

fl-fA f fc + A f c — A 

~ c '-y 


This may be written 


P = , 


where 


_ Cpp' -c ^ ^ 
' 1 -pp' 


pp 


Here A is a root of the quadratic equation 

(A + A)(R + A)(c-A) = (o + A)(6 + A)(r;-A), 
or y{A + B + C-a-b-c) + A{AB-ab + {a^-b)C-{A + B)c}+abC-ABc = 0. 

From the first form of the etjuation the roots are always real, one lying between c and the 
negative value of the smaller of a and h. The second root exceeds only two of the four quan¬ 
tities — A, — B, —a and -A, and may be ignored. 

The contribution to of fhe observations of a cases out of A is 

' ^ ’ [Ap A{\~p)\ Ap{\-p) ,4(o + A) 


Hence in all 


xzl ^-3 4. 

^ U(o + Ar /;(A-hA)'^6’(c-A)r 


which may be used to test the significance of the observed discrepancy betw een the products 
abC and A Be. 





42.307 

307 


R. A. FISHER 


2. Extension to many independent events 


If for independent events having probabilities pj, pg, .... we observe respectively 
out of Ai, Ug out of Ag, Og out of A,, and for the combined event with probabilities 
PiPa* • • • I we observe in addition e out of C trials, then the equation for p, maximizing the 
likelihood is g, .4,-0, e (C-e)f 

Pr 1-Pr 

where P stands for the product p^Pf ...p,. 

The equation of estimation may be written 

^rPr-«, e-gP 
l-p, l-P' 


Hence, as in the simpler case of two components, we have the solution 


for all values of r from 1 to s. 
The equation for A, 


Pr 


p 

A, + X* 


c-A 

C-X 


(C'-A){ai + A)(ag + A)...(o.+A) = (c-A)(Ai+A)(Ag+A)...(A.+A), 
is in general of degree s. One and only one real root lies between — a^, when is the smallest 
value of Of, and + c. 

As in the simpler case, x* degree of freedom, and when A has been evaluated, it 

may be calculated from the expression 


X* = A* 


de-X^ri', 


d,(o,+A)/- 


Obviously the coefficient of A* in the expression for ^ is the invariance of A (the reciprocal 
of the variance) appropriate to large samples, when the estimated values are substituted for 
Pv Pt* *••• Pr '^^© X* by-passes the conventional calculation of the standard error. 



4S.S3B 


43 

THE RELATION BETWEEN THE NUMBER OF 
SPECIES AND THE NUMBER OF INDIVIDUALS 
IN A RANDOM SAMPLE OF AN ANIMAL 
POPULATION * 

(With A. Steven Corbet and C. B. Williamt) 


AUTHOR’S NOTE 


The matter reprinted constitutes Part 3 (by R. A. Fisher) of a triple 
communication from the author, A. Steven Corbet, and C. B. Wil¬ 
liams to the Journal of Animal Ecology. The tables are therefore 
numbered starting with Table 9. 

The author would like to emphasise that the chief initiative in 
this discussion was taken by C. B. Williams, to whom also is due the 
great variety of applications in which the distribution and the dis¬ 
cussion have been found useful. The author has been concerned only 
with establishing the relationship of the new distribution to others 
previously studied, notably the Poisson series and the negative bi¬ 
nomial; to demonstrating the fundamental mathematical relation¬ 
ships; to providing tables of sufficient accuracy and range to facili¬ 
tate to the utmost the numerical calculations which workers were in¬ 
clined to make, and to illustrate the use of these tables as applied 
to actual experimental data. 

The function i/S given in Table 10 is somewhat intricate analyti¬ 
cally. It may be expressed in terms of the function 

S 

- *» —log (1 - x) « f 
a 

in the form 


i 1 , 2 ^ 1 / 2\ _ 

S 12 t rt/ 4t-14-e-‘ 


Reprinted from Journal of Animal Ecology, Vol. 12 , No. 1 , pp. 42 - 58 , 1943 . 



48.53b 


(where «2 and S 3 are the sums of the inverse squares and cubes of the 
natural numbers)» a form useful for the larger values of t; or, in as¬ 
cending powers of < as 


_L <2 _L_ #3 t ^ j4 I ^ 46 

12 ^ 144 ^ ^ 1080 * ^ 32400 ^ 


1 

27216 


which may be used for values below the range of the table. The 
argument of the table is 

N c* - 1 

logic -r “ logic — i - 

o t 


The discussion of the appropriate standard error of a calculated 
from the two observational values N and S raises questions of some 
interest, since it would seem possible to adopt either N or S as rep¬ 
resenting the “size of the sample.” The formula given in Section 4 
is calculated for simultaneous variations of.i\r and jS associated with 
a fixed degree of sampling activity, this being such that the average 
values of specimens, N, and of species, S, are taken to be equal to 
the values observed. A full discussion of this point would have to 
go rather deep into the foundations of statistical inference. 



43.54 

54 Relation between numbers of species and individuals in samples 


PART 3 . A THEORETICAL DISTRIBUTION FOR THE APPARENT 
ABUNDANCE OF DIFFERENT SPECIES 


By R. A. Fisher 


(i) The Poisson Series and the Negative 
Binomial distribution 

In biological sampling it has for some time been 
recognized that if successive, independent, equal 
samples be taken from homogeneous material, the 
number of individuals observed in different samples 
will vary in a definite manner. The distribution of 
the number observed depends only on one para¬ 
meter, and may be conveniently expressed in terms 
of the number expected, m, in what is known as the 
Poisson Series, given by the formula 


Here n is the variate representing the number 
observed in any sample, m is the parameter, the 
number expected, which is the average value of n, 
and need not be a whole number. Obviously, m will 
be proportional to the size of the sample taken, and 
to the density of organisms in the material sampled. 
For example, n might stand for the number of bac¬ 
terial colonics counted on a plate of culture medium, 
m for the average number in the volume of dilution 
added to each plate. The formula then gives the 
probability of obtaining n as the number observed. 

The same frequency distribution would be ob¬ 
tained for the numbers of different organisms 
observed in one sample, if all were equally frequent 
in the material sampled. 

If the material sampled were heterogeneous, or if 
unequal samples were taken, we should have a mix¬ 
ture of distributions corresponding to different values 
of m. The same is true of the numbers of different 
organisms observed in a single sample, if the different 
species are not equally abundant. 

An important extension of the Poisson series is 
provided by the supposition that the values of m are 
distributed in a known and simple manner. Since 
m must be positive, the simplest suppiosition as to its 
distribution is that it has the Eulerian form (well 
known from the distribution of x*) such that the 
element of frequency or probability with which it 
falls in any infinitesimal range dm is 


df = J -yjdm. ( 2 ) 

If we multiply this expression by the probability, 
set out above, of observing just n organisms, and 
integrate with respect to m over its whole range 
from o to 00 , we have 


/" L- 




m" 

n\ 


dm. 


which, on simplification, is found to have the value 

(k+jn-i)l _ 

(A- 1)1 nl (i+/>)*<"’ 


which is the probability of observing the number n 
when sampling from such a heterogeneous popula¬ 
tion. Since this distribution is related to the negative 
binomial expansion 


V (k + n-i)\/ p Y 
\ 1+pl (h-t)\n\ \i-^pl ' 

it has become known as the Negative Binomial distri¬ 
bution. It is a natural extension of the Poisson series, 
applicable to a somewhat wider class of cases. 

The parameter p of the negative binomial distri¬ 
bution is proportional to the size of the sample. The 
expectation, or mean value of n, is pk. The second 
parameter k measures in an inverse sense the varia¬ 
bility of the different expectations of the component 
Poisson series. If A is very large these expectations 
are nearly equal, and the distribution tends to the 
Poisson form. If heterogeneity is very great A be¬ 
comes small and approaches its limiting value, zero. 
This second parameter, A, is thus an mtrinsic pro¬ 
perty of the population sampled. 


( 2 ) The limiting form of the negative binomial, 
excluding zero observations 

In many of its applications the number n observed 
in any sample may have all integral values including 
zero. In its application, however, to the number of 
representatives of different species obtained in a col¬ 
lection, only frequencies of numbers greater than 
zero will be observable, since by itself the collection 
gives no indication of the number of species which 
are not found in it. Now, the abundance in nature 
of different species of the same group generally varies 
very greatly, so that, as 1 first found in studying 
Corbet’s series of Malayan butterflies, the negative 
binomial, which often fits such data well, has a value 
of A so small as to be almost indeterminate in magni¬ 
tude, or, in other words, indistinguishable from zero. 
That it is not really zero for collections of wild 
species follows from the fact that the total number 
of species, and therefore the total number not in¬ 
cluded in the collection, is really finite. 'I’he real 
situation, however, in which a large number of 
species are so rare that their chance of inclusion is 
small, is well represented by the limiting form taken 
by the negative binomial distribution, when A tends 
to zero. 

The limiting value k-o cannot occur in cases 
where the frequency at zero is observable, for the 



43.55 


R. A. Fisher 


distribution would then consist wholly of such cases. 
If, however, we put /r 3=0 in expression (3), write x 
for pl(p+ i), 80 that x stands for a positive number 
less than unity, varying with the size of the 
sample, and replace the constant factor (It—i)! in 
the denominator, by a new constant factor, a, in 
the numerator, we have an expression for the ex¬ 
pected number of species with n individuals, where 
n now cannot be zero, 


55 

These two relationships enable the series to be 
fitted to any series of observational data, for if 5 is 
the number of species observed, and N the number 
of individuals, the two equations 

5=-alog, (i-x), N=oLxfii-x), 

are sufficient to determine the values of a and x. 
The solution of the equations is, however, trouble¬ 
some and indirect, so that to facilitate the solution 
in any particular case I have calculated a table 
(Table 9) from which, given the common logarithm 


Table 9. Table of logi^ N/a in terms of logi,^ N/S, for solving the equation 





5 = 

a log, 

N\ 

1 +--jy given S and N 




logio NfS 

0 

1 

8 

3 

4 

5 

6 

7 

8 

9 

0*4 

0*61 lai 

63084 

65033 

66939 

68832 

70701 

7255* 

74382 

76x95 

77990 

0-5 

079766 

81586 

83271 

85002 

86717 

88417 

90105 

9*779 

93442 

95092 

0*6 

096730 

98356 

99973 1 

[-0x579 

03*74 

04759 

0633s 

07902 

09460 

110x0 

0-7 

I'lasso 

r4c«3 

15607 

*7124 

18634 

80136 

2163* 

23x20 

24602 

26077 

0-8 

l■a7S46 

39008 

30465 

3*9*6 

3336* 

34801 

36234 

37663 

39087 

40506 

0-9 

X ‘4i9ao 

43329 

44733 

46133 

47528 

48919 

50305 

51688 

53066 

54440 

VO 

1-55810 

57*77 

58539 

59898 

6*254 

62605 

63954 

65299 

66640 

67979 

VI 

1-69314 

70646 

7*975 

73301 

74623 

75943 

77261 

78575 

79886 

8*195 

vz 

i-8a5oi 

83805 

85x06 

86404 

87700 

88994 

90285 

9*574 

92860 

94*44 

1-3 

1-95426 

96706 

97984 

99259 

8-00538 

0x804 

03073 

04340 

05605 

06869 

14 

a-08130 

09389 

X0647 

1x90a 

13156 

14409 

15659 

16908 

*8155 

19400 

I'S 

a-ao644 

8x886 

23x36 

24365 

25602 

36838 

28072 

29305 

.30536 

3*766 

1-6 

8-32994 

34221 

35446 

36670 

37893 

391*4 

40334 

4*553 

42770 

43986 

1*7 

Z-4S20I 

464*4 

47627 

48838 

50048 

5*256 

52464 

53670 

5487s 

56079 

1-8 

a-57a8a 

58484 

59684 

60884 

62083 

63280 

64476 

65672 

66866 

68059 

1-9 

8-69858 

70443 

71633 

72888 

740*1 

75*98 

76385 

77570 

78755 

79939 

a-o 

8-8x181 

82303 

83484 

84664 

85843 

87022 

88199 

89376 

90552 

9*727 

a‘i 

a-98901 

94075 

95247 

964*9 

97590 

98760 

99930 

301099 

02267 

03434 

a-2 

3-04600 

05766 

06931 

08095 

09259 

10422 

XI584 

*2745 

13906 

15066 

a-3- 

3-i6aas 

*7384 

18542 

*9699 

20856 

22013 

23168 

24323 

25477 

26630 

a-4 

327783 

88936 

30087 

31238 

32389 

83539 

34688 

35837 

36985 

38133 

S’S 

3-39280 

40426 

4*572 

427*7 

43862 

45006 

46150 

47293 

48436 

49578 

a-6 

3-50719 

5x860 

53001 

54*4* 

55280 

564*9 

57558 

58696 

59833 

60970 

a-7 

3-68x06 

63843 

64378 

655*3 

66648 

67783 

6891s 

70048 

71x81 

723*3 

a8 

3-73445 

74577 

75707 

76838 

77968 

79097 

80227 

8*355 

82484 

836x1 

a-9 

384739 

85866 

86998 

881x9 

89244 

90370 

9*495 

92619 

93743 

94867 

30 

3-9599* 

97**4 

98336 

99358 

4-00480 

0x602 

02723 

03843 

04964 

06084 

31 

407203 

08388 

0944* 

10560 

11678 

12795 

*39*3 

*5030 

16x47 

*7263 

yz 

4*8379 

*9494 

20610 

21725 

22839 

*3954 

25068 

26181 

27295 

28408 

3-3 

4-29520 

3063a 

3*744 

32856 

33967 

35079 

36x89 

37300 

384*0 

39520 

3'4 

4-406x9 

4*738 

43847 

43956 

45064 

46x7a 

47280 

48387 

49494 

50601 

3S 

4-5*707 

52814 

53920 

55025 

5613* 

57*36 

58340 

59445 

60549 

61653 


The total number of species expected is conse¬ 
quently 00 w 

£ - ss — a log, (i — x), 

n-i 

so that our distribution is related to the algebraic 
expansion of the logarithm, as the negative binomial 
distribution is to the binomial expansion. Next, it 
is clear that the total number of individuals ex¬ 
pected is 2 ox . 

£ oue" =-. 


of N/St we may obtain that of N/«. Five-figure 
logarithms are advisable, such as those in Statistical 
Tables. If x be eliminated from the two equations, 
it appears that 

N=oe(e*'*-i), 5=alog, ^1+^^, 

U>d 

from which Table 9 has been constructed. 



48.56 


S6 Relation between numbers of species and individuals in samples 


(3) Fitting the series 

The use of the table is shown, using Williams’s 
extensive data for the Macrolcpitloptera at Har- 
penden (total catch for four years). Symbols + 
and — are used to indicate numbers to ho added 
and subtracted respectively. 


Symbol 

Number 

Common 

logarithm 

S 

N 

240 

-2-38021 

I sbof) 

4^-19338 

N{S 


1-81317 

From the table 

log (N/S) 

log (A7«) 


-1-81 

'2 58484 


b 1-82 

f 2-50684 

Difference 

0-01 

0 01200 

Proportional parts 

000317 

0-00380 


1-81317 

2-58864 

Then 


Common 


Numlu-r 

logarithm 

Nfx 


-2-58864 

N 


+ 4-19338 

X 

,10-2.18 

1-60474 

For constructing the distribution 

wc should then 


calculate 


* 5609 

N + ol 15649248 


09974281. 


The quantity « is independent of the size of 
sample, and is proportional to the number of species 
of the group considered, at any chosen level of 
abundance, relative to the means ot capture em¬ 
ployed. Values of a from different samples or ob¬ 
tained by different mcthod.s of capture may therefore 
be compared as a measure of richness in species. To 
this end we shall need to know the sampling errors 
by which an estimate of a may he affected. 


(4) Variation in parallel samples 

Whatever method of capture may be empkiyed, 
it is to be expected that a given amount of activity 
devoted to it, e.g, a given number of hours exposure 
of a light-trap, or a given volume of sea water pas.sed 
through a plankton filter, will yield on different 
occasions different numbers of individuals and of 
species, and, consequently, varying estimates of %. 
The amount of variation of these kinds attributable 
to chance must form the basis of all conclusions as 
to whether variations beyond chance have occurred 
in the circumstances in which two or more samples 
were made. 

In strictly parallel samples, i.c. equivalent sam- 
ling processes applied to homogeneous material, the 
numbers caught of each individual species will be 
distributed in a Poisson scries, and it easily follows 
that the same is true of the aggregate number, N, 
of all species. Since iV is a large number of hundreds 


or thousands, this is equivalent to N being normally 
distributed with a variance equal to its mean, so that 
to any observed value N we may attach a standard 
error (of random sampling) equal to ± \/N. 

For the variation of S we must obtain the distri¬ 
bution of .species according to the number m ex¬ 
pected in the sample; modifying expression (a) in 
the same way as (3) has been modified, this is found 
to he 

ae-“"'Vw>/m. (5) 

The probability of missing any species is so 
that the contribution to the s.'impling variance of S 
due to any one species being sometimes observed 
and sometimes not, is 

g-n» {i~e ”•). 

Multiplying this by the frequencies in (5) and 
integrating over all values of»«, we have 

which is the sampling variance of S. For large 
samples this is approximately (0-6931) a. 

Variations of S and N in parallel samples are not, 
however, independent. When present, a species must 
contribvitc on the average m/(i—e"’") individuals, 
which exceeds the expectation in all samples by 
me ”• 

and as the frequency of occurrence is i - e"’", each 
species must contribute m.e ’"to the covariance of 
S and yV. 'I’he covariance is thus found to be 




V-fa' 


From these three values it is possible by standard 
methods to find the sampling variance of S in 
samples having a given number of specimens N, 
which is 


V {S), given N, — a. log. 


zN + x 
N + x 


x >N 

(N+x)* 


and, the variance of a. 


a» 

F(a)r.- 


(SN + Sx~Nx)^ 


We may, therefore, complete the example of the 
last section by calculating the standard error of a. 
Using the values obtained, the variance comes to 
I 1251, of which the square root is 10607. 

The estimate obtained for a, 40-248, has, therefore, 
a standard error of i -0607, available for comparison 
with like estimates. 


(5) Test of adetpuicy of the limiting distribution 
From the manner in which the distribution has 
been developed it appears that we never have theore¬ 
tical grounds for supposing that k is actually zero; 



R. A. Fisher 


but, on the contrary, must generally suppose that 
in reality it has a finite, though perhaps a very small, 
value. Our reasons for supposing this small value 
to be negligible must always be derived from the 
observations themselves. It is, therefore, essential 
to be able to test any body of data in respect to the 
possibility that in reality some value of k differing 
significantly from zero might fit the data better than 
the value zero actually assumed. 

The most sensitive index nr score by which any 
departure of the series of frequencies observed from 
those expected can be recognized, is found by the 
general principles of the Theory of Estimation, as, 
for example, in the author’s Statistical Methods for 
Research Workers, to be 

when a„ is a number of species observed with n indi¬ 
viduals in each. If the values of a^ conformed accu¬ 
rately with expectation, the total score would be 
equal to 

aa‘ 

If, on the contrary, the series were better fitted by 
a negative binomial with a value of k differing from 
zero, we should expect the difference 

to show a positive discrepancy. 

Applying this test to Williams’s distribution for 
240 species of Macrolcpidoptera, one finds, after a 
somewhat tedious calculation, 

{an(i+i+J + 724-86 


Difference -f 9-29 

The series, therefore, shows a deviation in the direc¬ 
tion to be expected for the negative binomial, though 
apparently quite a small one. In order to test the 


57 

significance of such discrepancies, I give in Table 10, 
for the same range of observable values of the avet:age 
number of specimens in each species N/S, the values 
of i/s, where t is the quantity of information, in 
respect of the value of k, which the data supply. 

Table 10. The amount of infomuition respecting k, 
supposed small, accor£ng to the numbers of indi¬ 
viduals (N) and species (S) observed 


log,, Nts 

i/s 

log,, N/S 

i/s 

0-4 

0-1971 



o-S 

0-2882 



0-6 

0-3914 

2-1 

3-1047 

0-7 

0-5054 

2-2 

3-3606 

0-8 

0-6295 

2-3 

3-6260 

0-9 

0-7639 

2-4 

3-9009 

i-o 

0-9076 

2-5 

4-1854 

i-i 

1 ‘0608 

2-6 

4-479* 

i-a 

1-2232 

z-r 

4-7825 

»-3 

1-3950 

2-8 

50954 

*•4 

1-5762 

2-9 

5-4*78 

*•5 

1-7665 

30 

5-7498 

1-6 

1-9661 

3-1 

6-0912 

1-7 

2-175* 

3-2 

6-4421 

1-8 

*-3934 

3-3 

6-8026 

1-9 

2-6211 

3-4 

7-1726 

2-0 

a-8582 

3-5 

7-552* 


Entering the table with our value 1-81317 for 
N/S we have *75=2-4656, or * = 591-7. This 
quantity may now be used for two purposes. In the 
first place it is the sampling variance of the dis¬ 
crepancy observed, so that, taking its square root, 
the standard error is found to be 24-33. This suffices 
to test the significance of the discrepancy, since 
9-29 ± 24-33 is clearly insignificant. 

If, on the contrary, a significant discrepancy had 
been found, an estimate of the value of k required 
to give a good fit to the data could be made by 
dividing the discrepancy by i. In fact 


9-29 . 

^ =0-016 

5917 


would have been the value of k indicated by the 
data, if any value other than zero had been required. 


REFERENCES 

Fisher, R. A. St Yates, F. (1943). ‘Statistical tables for Fisher, R. A. (1941). ‘Statistical methods for research 
biological, agricultural and medical research ’ (and ed.). workers ’ (8th ed.). Edinburgh. 

Edinburgh. 



INDEX 


Note to the user of this index. This is a loose inilex, not a tight one*. Thus, closely relutud topics 
may not appear together. This was done so that individual topics may appear under diverse 
headings. The user should look under many headini^ for any special topic of particular interest. 


A distribution 
definition 14.660 

distinction from Type C 14.671-672 
Abruptness (frecjuency curves) 

and 8up«*refficient estimation 10.347 -357 
corrections for 
failure 29.306-308 
net^d 10.319 

Acclimatization to altitude 34.422 
Accuracy 

formulas 2.757, 27.256 
observations 2.757-770 
plating method 4.325-369 
Advantage<ms mutations 
division 81.355-369 
dominance 31.356 
quantitative effect 81.356 
Agrostis 

wheat weed 8.130 
Alkyl radicals 
enumeration 41.397 
Allan, F. K. 

computer 22.534 
Allopolyploidy 

test by discriminant functions 32.185-188 
Alopecurtis 

wheat weed 3.130 
Alpha particles 4.330 
Analysis of variance 

Behrens-Fisher problem 36.178-180 
correlation ratio 18.811-812 
discriminant function 32.183-185, 88.377, 
84.424^25 
early use 8.111-124 
extension (non-central) 14.668-672 
polynomial, expected values 8.119-124 
potato yields 18.203,207,211 
regression, multiple 18.811, 86.238 
single df 8.121 
wheat yields 8.111-114 
significance (testing) 

early approximate 8.121-122 
with z 12.810-812 , 

single classification 4.328-329, 18.810, 86.397 


Analysis of variance {Continued) 
single df among polynomials 8.121 
standard coses 18.810--812 
to test techniejue 4.328-329 
Ancilliary information 26.48-54, 27,256-257 
And£rson, F^doar 
Iris 32.170,185,188 
Angular transformat ion 
in population genetics 19.205-220 
effect of bias 19.206 
more suitable for some purposes 10.325 
Antibiotics 4.345-347 
Arenarta 

wheat wood 8.131 

Association (see Cont.ingcncy tables. Correla¬ 
tion, Fourfold tables) 

B distribution 

and generalized distance 88.379 
definition 14.063-664 
table 14.665 
Bacterial density 

accuracy of plating method 4.325-359 
B. coli in milk 4.350,352 355 
soil bacteria 4.325-359 
sugar refinery 4. 350 
B. coli 4.350-355,359 
Balanced incomplete blocks 

and ort hogonal cubes 40.285-290 
Bakuacki, S. 

systematic designs 28.189-193 
Barnard, Mildred M. 
confounding 89.341,353 
discriminant functions 82.179, 88.376,380 
Bartlett, Maurk'E S. 

Behrens-Fijiher 86.175,180 
discriminant functions 84.423,429 
Bateman, H. 

radioactive decay 4.330,358 
Bayes, Thomas 

definition of probability 87.245,258 
inverse probability 10.324-327,368 22.528, 
84.285-286 

Bayes postulate (see Inverse probability) 



INDEX 


Bbhiubn0, W.-V. 

difference of means 26.397, 36.173a-180 
Bohrens-Fiaher problem (aee al$o Fiducial) 
general 26.3Q0a-308, 36.173a-180 
importance of weights different from apparent 
36.179-180 

in terras of analysis of variance 36.178-179 
Sukhatme’s table 86.177 
Bernoulli, Daniel 
binomial 12.805, 14.654, 24.286 
Bebsbl, F. W. 

accuracy formula 2.757a, 27.255 
letter from Gauss 27.249,258 
method of moments 20.318 
Bessel functions 

intrinsic accuracy of Cauchy distribution 
11.715 

multiple correlation coefficient 14.663 
Binomial 

fitting by chisquare, loss in intriusic accuracy 
11.702 , 

historical references 12.805. 14.654, 24.286 
negative 88.181a-187, 48.57 
relation to chisquare 18.360 
rt4ation to c and F distributions 18.362 
Blakeman's criterion 12.812 
Blocks (agricultural) 

and soil heterogeneity 18.208 
oversize 18.208 
Blocks (statistical) 
balanced incomplete 

and orthogonal cubes 40.285-290 
Boole, Georqe 

inverse probability 10.311,326,22.531,27.248, 
258 

Bortkiewicz, L. von 
deaths from horsc^kick 4.330,358 
Bose, K. C. 

generalized distance 14.653a, 83.379-381,386, 
86.242,249 
Bowlbt, a. L. 

chisquare 6.86a,91,04, 7.3-9 
Branches (combinatorial) 
enumeration 41.396 
Breed, R. S. 

B. coli 4.350-355,359 
Brownlee, J. 

chisquare 8.442-443,449 
Buddin, W. 

soil bacteria 4.350-352,359 
Burnside, William 
foundations 11.700 

C distribution 

and generalized distance 14.653a, 88.380-381 
definition 14.671 

distinction from Type A 14.671-672 
relation to non-central F 14.671 
Canonical variates 

and multiple correlation coefficient 14.658- 
659 


Cauchy distribution, location of 
efficiency 11.707,715-716 
intrinsic accuracy 11.716 
iterative estimate 11.707 
maximum likelihood 11.707 
mean ineffective 11.702 
median 11.707,715 
Center of location 
definition 10.309 
of Type HI 10.339 
of Type IV 10.340 
Characteristic function 
and moments 80.1 

as moment generating function 80.1-2 
definition 80.1 

giving translation formulas 80.2 
inversion 24.290-291 
of sufficient statistic 24.290 
Chisquare (see next four headings and Con¬ 
tingency tables, Fourfold tables, Pearson 
curves—Type III) 

Chisquare, minimum as estimate 
efficiency 8.446-447 
loss of intrinsic accuracy 11.721-722 
for binomial 11.722 
relation to maximum likelihood 
in general 9.95-100, 10.357-358 
linkage example 9.95-97 
Chisquare as goodness of fit test 
abnormal distribution 

estimation inconsistent 8.444 
estimation inefficient 8.444-447 
hypothesis false 8.443-^4 
applied 

to Poisson index 4.337-342,351,354 
use of sextiles 4.340,351,354 
approximate test for regression 6.596a-612 
Bow ley’s views 7.4-9 
Brownlee's experiments 8.442-449 
components due to inefficiency and variation 
9.98-100 

degrees of freedom 6.80a, 7.1-9, 8.442, 
10.314-315, 12.806-807 
derivation from Poisson 6.89-89a 
differential death-rates 6.92(footnote) 
effect of fitting parameter 

effect of non-linear restrictions 6.94 
efficiently 8.441-448 
inefficiently 8.447-448 
several 7.1-9 

formulas for 2X2 table 7.5-6 
differences 7.5-6 
in small samples 8.449 
Pearson's view's 7.4-9 
special contingency tables 42.306-307 
Yule’s experiments 7.6-7 
Yule's suggestion 4.338 
Chisquare as Poisson index 
applied to data 4.336-356 
derivation 4.335, 12.807 
exceptional variability 4.341-344,358 



INDEX 


Chisquare fus Poisson index {.Continued) 
goodness of fit 4.337-342,361,354 
interference pattern 4.34^350,358 
warning signal 4.341-344,358 
Chisquurc distribution 
as I.<cxia index 12.807 
as Poisson index 4.335 
as variance distribution 12.807 
derivations 5.89~80a, 18.353-354 
distribution of logarithm 12.808 
Elderton's tables 8.122, 4.335-337,342.358, 
6.88,92, 6.597-601. 7.1-7, 12.806, 14.656 
fiducial distribution of variance r<»miMneiit 

26.397 

Helmert 18.354,365 
htstory 13.355(footnote) 
non-central 

definition 14.669-670 
relation to Tyf>e II 14.670 
probability integral 18.356 358 
relation to Poisson 18.357-358 
relation to z distribution 12.806,809 
Schuster’s test in harmonic analysts 16.54 
single sufficient statistic 24.292 
Cirsium 

wlu'at wee<l 3.130 
Combinatorial theoreins 
alkyl radicals 41.397 
branclH*.s 41.396 

diagonal types of lintin squure.s 41.400 401 
kinds of samples of n 41.395 
fixed weight 41.39,5 
rings of branches 41.399 
rings of parts 41.397-398 
(kiniparative accuracy of estimates 
criterion ff>r choice 2.770 
of scale from the normal 2.758 770 
Complex exiieriments 17.511-513 

potatoes 18.208-212 
Component of variance 

fiducial distribution 26.397 3tl8 
Confidence limits 26.393,398, 26.38a,50-51 
Configuration 

determining accuracy 24.301-303 
recovering information on 24.301 -303,305-306 
C'!onfoun<led subgroup 39.342 
Confounding of factorial experiments 
confounded subgroup 39.342 
highly symmetric coses 89.347,353 
intrablock subgroup 89.342 
Icvc-ls 40.283-288 
more levels, 39.343-344 
p” levels 40.288-290 
relat ion to group theory 89.341-353 
seven to fifteen factors at twi> levels 89.344 - 
353 

Consistency 

application to grouping correction 10.317-321 
definition 8.444, 10.309,316, 11.702-703 
easily obtained 2.758 
examples 10.317-323 


Cont ingency tables {ate also Fourfold tables) 
and chisquare 7.1-9 
and discriminant functions 84.426-429 
degrees of freedom 6.87-94, 7.1-9, 8.442, 
10.314-315, 12.806-807 
difTenmtial death-rates 6.tl2(footnote) 
in general 8.442-450 
kinds of table 7.1 -9 

large sample equivalence of various tests 7.3 
Poisson index 4.335 
2 X « table 6.92-93 
Convolxndua 

wheat weed 8.130 
C'orrelation 

histoiy of distribution 6.599, 10.315, 12.811, 
14.6.55, 86.243 
interclass distribution 1 
intrnclass distribution 1, 4.328- 329, 12.809 
relation to analysis of variance 12.810 
multiple {ace Multiple correlation) 
of estimates 8.445 446, 11.704 706 
of residuals 8.125-126 
ratio 12.811-812 

and analysis of variance 12.811 
and ivgn'Hsion 6.597,002 -603 
ndation to variance ratio 6.603 
transformation (r -= tauh z) 8.125, 12.809- 
810, 14.655 
Criminal twins 26.48 
Crop variation 18.201-213 
C’itVMP, Ij. M. 

soil bueteria 4.359 
Cumulants 

and cumulative function 80.2 
and moments 30.3 

defined na “cumulative moment functions” 

20.201 

definition 80.2 
grouping error 80.3-4 
m general 80.1 -14a 
of A'-statistics (or sample moments) 
and partitions 20.205-206 
computation, univariate 20.207-209 
formulae 20.209 -215 
involving unit parts 20.206 207 
multivariate 20.217-230 
notation and generation 20.204-206 
of sample moments 20.199-238 
of tests of normality 21.22 
of transforms 80.7-8 
of z 80.11-14 

relations to moments 20.201 202,215-216 
8|>ecifyinK distribution 80.9 11 
Ctm.ER, D. M. 

soil bacteria and protozoa 4.326,331-344,359, 
10.363-366 

Dent.h-rates 

differential 6.92(footnote) 

S (mea-sure of kurtosis) 
definition 21.24 



INDEX 


S (measure of kurtoais) (Continued) 

moments and product moments 21.25-28 
recurrence relation 21.24-25 
i>B Moivrb, a. 

normal cumulative 12.302-365 
De Moboan, a. 

inverse probability 27.247,258 
Difference of means (see Anal 3 rsis of variance, 
single classification, Behrens-Fisher 
problem. Student’s t) 

Differential death-rates 2.Q2(footnote) 
Differential equations 
genetic 19.206-214, 21.355-369 
Diffusion of advantageous genes 21.355-360 
ambiguity of velocity 21.356-360 
differential equation 21.356-359 
particular solution 21.361-360 
random walk 21.360-361 
Digamma function 24.289-293 
Dilution series 
information 10.366 
made continuous 26.51-63 
maximum likelihood solution 10.363-364 
short method, efficiency of 10.365-366 
Diminishing returns (agricultural) 2.115,135 
Discriminant function 
analysis of variance 22.183-185 
and analysis of variance 22.377 
and light on multiple regression 24.422 
and Hotelling's T 22.378 
and Mahalanobis’ generalised distance 
22.377-381 

and partial regression for dummy variable 
26.246 

and regression 22.376-381 
computation 

more than two populations and one con¬ 
trast 22.185-188 
two populations 22.179-182 
definition 22.170 
errors in Paper 32 24.423 
general 22.179-188, 28.376-386 (384-386 in 
error; cf. 88.38i6a-385b or 84) 
multiple contrasts 28.381-386 (384-386 in 
error; cf. 88.385a-385b or 84) 
degree of freedom 88.385 a 
multiple correlation 82.185 
probability of misclassification 22.183 
relation to regression 82.184-185 
simultaneous for two sets (canonical variates) 
24.426-429 

incomplete tests of significance 84.427-429 
test of significance 22.186, 28.378 
testing a proposed discriminant 24.423-426 
and analyi^ of variance 24.424-425 
Distribution (s) (ses ofso A distribution, B distri¬ 
bution, C distribution, F distribution, 
Pearson curves, Poisson distribution and 
series, Student’s t, s distribution, etc.) 
for which moments are efficient 10.355-356 
problems of 10.309,314-315 


Distribution (s) (Continued) 

specification by cumulants 80.8-14a 
Distribution theory 

of common statistics, history 14.664-657 
Divergence coefficient 12.807 
Double exponential distribution 

distribution of midrange from rectangular 
10.340 

location 24.207-303 
information 24.298 

provided by median 11.716, 24.298-300 
likelihood curve 24.300 

recovered by configuration 24.301-302 
relation to the mean deviation 2.769 
Dummy treatments 
as replications 18.202-205 
isolation of df 18.205 

Eddington, A. S. 

precision of estimates 2.757a, 762 
Eden, T. 

crop variation 18.201-213 
winter oats 17.511 
Edgeworth, F. Y. 

inverse probability 10.311,368,27.248.251.258 
Efficiency (see also Location, Method of 
moments, Pearson curves. Scaling) 
criterion 6.598, 8.446-446, 10.310,316 
definitions 10.309, 11.703,714 
loss for other statistics 11.711 
measured by correlation with efficient sta¬ 
tistic 8.446, 10.317 
of chisquare 8.446-447 

of non-normaUy distributed statistics 10.351 
of some maximum likelihood estimates 
11.710-711 

of median in location 
Cauchy 11.716-716 
double exponential 11.716-717 
normal 8.446, 11.706-707 
of method of moments 

negative binomial 88.183-185 
of simultaneous fitting 

measured by generalized variance 28.183 
negative binomial 88.183-185 
of weighting 11.711-712 
Efficient statistics 

and minimum chisquare 8.446-447 
correlation of two such 8.445-446 
correlation with inefficient 8.446 
correlational properties 11.704-706 
maximum likelihood 8.445 
superefficiency 10.347-357 
Eldbrton, W. P. 

chisquare tables 8.122, 4.335-337,342,358, 
6.88,92, 8.597-601, 7.1-7, 12.806, 14.665 
method of moments 20.810-312,318 
Engberding, D. 

soil bacteria 4.369-352,869 
Entropy 

and information 26.47 



INDEX 


Equisetum 

wheat weed 8.130 
Error functions 12.805 
F]rrors of counting 4.352-354 
Estimated deviates 
sampling error 2S.xxx-xxxiii 
Estimation (see also 1x>cation, Method of 
moments. Scaling) 
bias 26.42 

by maximum likelihood 10.323-324,327-330 
comparative accuracy 
criterion for choice 2.770 
of scale from the normal 2.758-770 
consistent 26.41-42 
criteria 10.310-317 
in multichotornicB 0.94a-100 
inconsistent 8.444 
inefficient 8.444 

lack of non-paramctric (heory 27.250 
minimum chisquare 8.446-447, 11.721-722, 
29.316 
of error 

and reduction of error 17.508 
valid and invalid 17.50tV-509 
problems of 10.310,314-315, 11.701-725 
simultaneous 26.r)4 
superefficient 10.347-357 
unsolved problems 26.53 54, 27.257 
variance 26.42—46 
Evolutionary variance 

selective advantage of genes causing 19.218- 
219 

Experimental sampling 7.7, 8.442-443, 10.350 
Extreme observations 

limiting distributions 15.180-190 
change with n 15.182-188 
from normal 15.185 190 
penultimate form 15.186-190 
Eye color 

anti hair color 84.426-429 
Ezkkiei., M. 

hducial graphs 22.534 

F distribution 
non-central 

definition 14.671 
ndation to Type C 14.671 
Factorial experiments, confounding of 
more levels 39.343-344 
V levels 40.283-288 
p*- levels 40.288-290 
relation to group theory 89.341-353 
seven tx> fifteen factors at two levels 89.344- 
353 

Fairfield Smith, H. 

discriminant functions 88.376-386 
Fiducial (see also Behrens-Fisher problem) 
distribution 

difference of normal means 25.396-397, 
86.174-180 

normal variance component 25.397-308 


Fiducial, distribution (CosUinued) 

of a new normal observation 25.303-304 
of a new normal sample 25.394 
of normal parameters 26.394-395 
graphs 22.534 

probability 25.391-398, 22.533.535, 86.173a 
correlation coefficient 22.533-535 
distinguished from confidence 26.38a,50- 
51 

generality in univariate case 26.395 
nature 26.391-393 
statements 27.253-256 
from configuration 27.257 
from sufficient statistica 27.250 
Field experiments (agricultural) 
arrangement 17.503-513 
Field theory (mathematical) 

and confounding factorial experiments at p' 
levels 40.288-290 
Filon, li. N. G. 

standard errors 10.329,368 
Finite siunples 26.46-48 
Fisher, R. A. 

(references to other pajiers here reprinted) 
(1) 11.703, 14.655; (3) 6.607, (6) 4.358, 
6.5tHi-.'»97, 7.1, 8.442, 14.655; (6) 14.656; 
(7) 11.665; (8) 5.86, 6.696, 9.95, 11.704, 
14.655; (10) 4.358, 8.445, 11.701,712, 
14.656, 20.200,230, 29.308,317; (11) 

24.288,298, 29.317; (14) 88.379; (16) 
86.240; (20) 21.16; (26) 86.179; (82) 
88.376,3B5b, 84.422,425; (88) 86.242, 
246; (86) 84.422; (89) 40.283 
''Design of Experiments’’ 89.340a, 40.283 
"Statistical Methods for Research Workers’’ 
9.95, 14.656, 17.509, 24.292. 29.317, 
86.173a-174, 48.57 
"Statistical Tables . . .’’ 48.57 
acclimatization to altitude 34.422 
Behrens-Fisher tables 26.390a 
distribution of correlation coefficient 6.599, 
10.315, 14.655,661,668-669, 86.243 
fiducial probability 36.173a 
incomplete blocks 39.353 
rainfall and wheat 14.656,659 
regression coefficients 86.238 
Student's distribution 26.393 
Fourfold tables (see also Chisquare, Contin¬ 
gency tables) 

Bowley’s views 6.91, 7.4,9 
degrees of freedom 6.90-91, 7.a-9 
"exact method" 26.48-^9 
fiducial limits for cross ratio 26.50 
fixed margins 26.48-51 
Greenwood and Yule 6.90 
Pearson’s tables 6.91 
Pearson's views 6.91 (footnote) 

Yule’s experiments 7.6-7 
Frequency curves, definition includes hypo¬ 
thetical infinite population 10.311-312, 
11.700 



INDEX 


Functional equation 

for limiting distribution of extreme observa¬ 
tion 16.180,182 

for multiplicative process 19.209 

y (measure of skewness) 
definition 81.17 

in normal samples of 3 21.19-20 
lack of normality of 21.23-24 
moments for normality 
in samples of 3 81.18-20 
recursion relation 81.20-23 
product moments 81.25-28 
recursion relation for normality 81.17-18 
Gauss, K. F. 

inverse probability 88.628,531 
least squares 37.249,258 
method of moments 89.318 
normal curve 18.806 
Gbiqer, H. 

alpha particles 4.330 
Gene frequencies and ratios 

distribution for rare mutations 19.205-220 
contribution to variance 19.218-219 
decay 19.207,209.212-213 
effect of weak selection 19.215-218 
maintained 19.207-215 
functional equations 19.209-215 
Generalized distance 14.653a, S8.375a-386, 
86.242-249 

Generalized variance 86.247 

measure of efficiency 88.138-185 
transfornuition 88.183 
Generating function 

for cumulants 80.201,218,228 
for moments 20.201,216,217 

for new variable(8) 80.227-228, 21.27 -28 
for number of branches 41.396 
Genetics of populations 19.204a-220, 81.354a- 
369 

Geometric method 

and general regression 86.239 
applied to interclass correlation 1 
chisquare 6.85a,88,90, 18.354 
harmonic analysis 16.56-58 
joint distribution of mean and standard 
deviations 8.763-707 
mean devial ion 8.761 
multiple correlation coefficient 14.657-660 
standard deviation 8.759 
Wishart distribution 86.244 
Geometrical probability, Stevens’ problem in 
relat ion to the test of significance in harmonic 
analysis 87.14-17 
Gibbs, J. WinitARD 
distribution problems 10.315 
Goodness of fit (see ofso Chisquare) 
after use of moments 6.93 
and inefficient estimates 11.707 
by chisquare 


Goodness of fit, by chisquare (Continued) 

throwing adjacent like deviations together 
4.338 

negative binomials 88.187 
of part of distribution 4.337 
of regression lines 6.596a-007 
Gobbet (eee Student) 

Greenwood, M. 

chisquare 6.85a,90-91,94 
Group theory 

and confounding factorial experiments 
p levels 40.283-288 
two levels 89.341-353 
Grouped normal distributions 

and maximum likelihood 10.359-363 
and Sheppard’s correction 10.360-361 
efficiency of moments 10.362-303 
loss in efficiency due to grouping 10.361-363 
Grouping 

average effect 80.3-4 

average effect on normal moments 10.319-320 
corrections 10.316-320,361-363,368, 29.306- 
314 

Group-invariant families of distributions 
84.296-307 

ostimat ion of one parameter 34.296-303 
location an example 84.297 
location and scale 84.303-307 

Haldane, J. B. S. 

negative binomial 88.182-187 
Hanseman, G. H. 

method of moments 89.310-312,318 
Harmonic analysis 
distribution 

cumulative 36.240 
mean 36.241 
table 86.242 
four distribut ions 16.58 
testa of significance 16.54-59, 86.240-242, 
87.14-17 
Fisher's 16.55-59 

approximate 16.58-59 
tables 16.59-59a 
Schuster’s 16.54-56 
second largest 87.16-17 
relation to Stevens’ problems 87.14-17 
Helmert, R. 

chisquare distribution 18.354,365 
Hermite polynomials 

definition and elementary properties 28.xxvi- 

XXX 

in expanding distributions 80.9-14a 
orthogonality 8S.xxvii 
Histogram 

frequency » area 38.188(figure) 

Hooker, R. H. 

weather and crops 8.128,135 
Horse-kick, deaths from 4.330,358 
Hotelling, H. 
formulae 80.209 



INDEX 


II0TKL.LIN0, H. (Continued) 

multivariate 8S.376a~38«. 36.242-246,249 
Hotf'lling's T 

and discriminant functions 33.378 
Hsu, P. L. 

latent roots 36.246 
Hyporgeometric function 
and multiple correlation cocflicieut 14.660 

Inaccuralc estimate of error 
a disadvantage 28.191 
Incomplete blocks 89.353 
Index of precision 18.205 
Index of variability 

rhisquare for Poisson 4.334 -356, 12.807 
Inference 

general 26.39-42, 27.245-258 
Infinite hypothetical {loputution (acc Popula¬ 
tion) 

Information (sec also Intrinsic accuracy) 
amount of 11.710, 26.43 -44, 27.250 
and entropy 26.47 
intt'gral 

variance of efficient estimate 8.445 (dis¬ 
crete case) 
irrelevant 10.312 

matrix for rx'gat.ive binomial 38.184 
recovery of lost 24.300-303 
relevant 10 312 
whole of 2.709 
Inoculation 

effect of 6.90 91 
InttTclasB correlat ion 
distribution 1 

Intrablock subgroup 39.3(2 
IntraclasH correlation 
distribution 1 

Intriasic accuracy (see also Information) 
additivity 11.710 
as defining efficiency 11.714 
definition 10.310, 11.709 
identified with information 26.47 
integrals 11.709 
loss 11.717-721 

by iterative approximation 11.722-723 
excess 11.721-723 
for C’auchy 11.720 
minimal 11.718 -720 

prevention by use of ancilliary statistics 
11.724 725 
of Cauchy 10.339 
of statistics 11.717 

fully accurate statistics, theoretical exist¬ 
ence 11.723 
of Type IV 10.341 

Invariance 24.303-307. 26.44, 37.183 
Inverse probability 
application to binomial 10.324-327 
Bayes 10.324-327,368, 22.528, 24.285-286, 
26.392,398, 27.246,249 
Boole 10.311.326, 22.531, 27.248,258 


Inverse probability (Continued) 

Chrystal 10.311,326, 22.531, 27.248 
Ihi Morgan 27.247,258 
Fklgpworth 10.311,368, 27.248.254,268 
effect of change of terms 10.304-307 
Fisher’s early attitude 27.248 
Gauss 22..528,531 

history 10.311,320, 22.528,535, 27.245-264 
L.-»plact‘ 10.311,326, 22.528, 27.247-248, 

258 

Pearson 10.311,368, 27.248,258 
Poisson 10.311,326 
Price 22..528 

Venn 10.311,326, 22.531, 27.248,254,258 
Ins 

and diseriminanl functions 82.179-188 
Isostatistieal rt'gioii 
and maximum likelihood 10.330 
definition 10.310 
ISHBRUIS. 1 j. 

sampling moments 20.200,238 
Iterative estimation 

approximate maximum likelihood 11.700- 
702,708 709,722-723 
locating the Gauchy 11.708-709 

Jbans, Sm James 

distribution problems 10.315 
Jekkrkys, Hakoud 

negative binomial 38,183,187 

fr-statiatics 20.199-238 

aiui elemimtary symmetric functions 80.5 
and power sums SO.5-7 
general form 30.7 
definition 20.202-204, 80.4-5 
definition, multivariate 20.217 
distribution of ratios 21.15a-28 
Kei-uey, T, L. 

nornml table 8.443 
Keynes, J. M. 

definit.ion of probability 10.327,368 
Kinds of samples 
enumeration 41.395 
inumentn of (in sampling) 20.10t>-238 
of fixe<l weight 41.396 
KoUlI, IllIBBRT 
plate technique 4.326 
Koshai., H. S. 

Karl Pearson 29.302-318 
method of moments 29.302 318 

Lanue 

erimitud tvviiis 26.48 
Lapua'-e, I*. S. 

inverse probability 10.311,326, 22.528, 
27.247-248,258 
normal curve 12.805 

Laplace distribution (see Double exponential 
distribution) 

Large sample theory 26.41-46 



INDEX 


Lai^sest value 

from exponential population 16.65 
ratio to sum 16.68-69 
Ijatent roots 
distribution S6.242-24S 

from Wishart distribution 86.244-246 
normality of either set of variables 86.246 
Jacobian fn>m moments 86.246,248 
origin of 86.243 
Liatin squares 17.510-611 
diagonal typos 

enumeration 41.400-401 
for potatoes 18.202-205 
gain in use 18.204-205 
nature of approximation 86.239-240 
random selection 18.202 
unsuitability for, many treatments 18.200 
Least squares 
history 27.240 
Lexis, W. 

divergence coefficient 12.807 
Likelihood 

contrasted with probability 22.532 
definition 10.310,326, 11.707, 22.532 
function 24.287-296 
mathematical 26.40, 27.246-250 
maximum (»ee Maximum likelihood) 
not a differential element 10.327, 11.707 
solution 42.306a-307 
two new properties 24.285-307 
Limiting distribution(8) 
extreme observations 16.180-183 
moments 16.186-188 
of s, (, and chisquare 18.806-809 
Location 

by extreme observations 10.348-351 
definition 10.310 

example of group invariance 24.296-303 
in general 10.338-342, 24.303-307 
of Cauchy 10.338-339, 11.715-720 
by median 11.716-716 
loss of intrinsic accuracy 11.720 
table of efficiency 11.716 
of double exponential 

by median 11.716-717, 84.298-300 
information 84.296-302 
use of configuration 84.301-302 
process 10.310 

Location and scaling 84.303-307 

distribution of maximum likelihood estimates 
for given configurati<m 84.305 
example of group invariance 84.303^-307 
in general 10.338-342 
of normal 11.708-704 
of Pearson curves 10.342-347 
of Type III 10.339 
of Type TV 10.341-342 
Logarit^ic scale 

use to compare wheat yields 8.132-133 
Lorbntx, H. a. 
distribution problems 10.315 


Mackenzie, W. A. 
figures 10.368 
plating method 4.325-350 
MacMahon, P. a. 

combinatorics 41.306-401 
Macrolepidoptera 

Williams’ data on species abundance 43.56-57 
Maiialanobib, P. C. 

generalized distance 14.653, 88.375a-386, 
36.242,249 

Mahalanobis’ generalized distance 
and B distribution 68.379 
and C distribution 38.380-381 
and discriminant function 88.375a,378-381 
Martin, E. S. 

discriminant functions 38.179 
Mathematicians 

pure and* applied 10.323 
Maunci, K. 

eye color 84.426 
Maximum likelihood 
and chisquare 9.95-100, 10.357-358 
linkage example 9.95-97 
and dilution series 10.363-366 
and grouped normal distribution 10.359-363 
and standard deviation 11.703-704 
applied to discrete distributions 10.356,366 
approximation by iteration 11.700-702,708- 
700,722-723 

as efficient solution of equation linear in the 
frequencies 9.97-98 

asymptotic variance 10.328-329, 26.45-46 
multiparameter case 10.332 
“defined” by linearity and efficiency 9.97-98 
distribution of estimates 
location 24.303 
location and scaling 84.305 
efficiency 

apparent 6.598 
for Type III 10.337 
when possible 11.710-711 
failure for Cauchy distribution 10.322 
fitting negative binomial 88.186 
original formulation in terms of inverse 
probability 10.326 

p, p', and pp' in separate experiments 48.306- 
307 

generalization 42.307 
Poisson distribution 10.359 
principle 10.323-324 
relation to Bayes 10.324-327 
yields sufficient estimates when they exist 
10.330-331 

Mean 

among Pearson curves optimal for normal 
alone 10.323 

use of other measure without criticism 10.323 
yields sufficient estimates when they exist 
10.330-331 
Mean deviation 

accuracy for the normal 2.758-770, 11.703 



INDEX 


Mean deviation (Continued) 
comparison with the standard deviation 
2.768-770 

in normal samples 8.767-768 
joint distribution with standard deviation 
2.763-767 

standard deviation of 2.760-762 
Mean error (aee Mean deviation) 

Mean square error (aee Variance) 

Median 

advance of w'ave of genes 31.360,368 
distribution 

for Cauchy 11.715 
for double exponential 11.716 
for normal 11.705-706 
efficiency fur location 
Cauchy 11.715-716 

double exponential 11.716-717, 24.207-300 
normal 8.^45, 10.322, 11.705-706 
invariance 22.530 

Method of moments (see also Location, 
IVarson curves, Scaling) 
accurate fitting 20.308,313,315 
efficiency 

grouped normal 10.362-363 
location and scaling Type IV 10.341-342 
negative binomial 38.183-185 
Pearson curvtsa 10.332-355 
Type III 10.332-337 

example of criterion of consistency 10.321 
failure for Cauchy 10.321-322 
heterotyposis 10.321 
Koshal and Pearson 29.303-318 
arithmetic errors 29.305-313 
Pearson 10.321,368, 29.302a-318 
place in teaching 29.315-317 
Midrange from rectangular population 
limiting distribution 10.349 
Mil.n£, a. 

sheep ticks 88.185 
Minimum chisquare 

efficiency 8.446-447, 11.721-722, 29.316 
Mitscherlich, K. a. 

diminishing returns 8.115,135 
Moments (see also Cumulants, Ac-statistics, 
Method of moments) 
and characteristic function 30.1 
and cumulants 30.3 
and A;>slutistira 80,6-7 
general 31.1-15 

moments of (in sampling) 20.19!)-238 
of measures of departure from normality 
21.16a-28 

of sample moments and As-statistics 20.199- 
238 

of ultimate and penultimate distributions 
15.186 

Multinomial distribution 
asymptotic variance 5.88-89 
from categories 11.719 
in genetic example 9,95-96 


Multinomial distribution (Continued) 
relation to Poisson 0.86a,89a 
Multiple correlation 

and analysis of variance 18.811 
and discriminant functions 82.185 
coefficient 18.810-811 
distribution 

hypergecimctric forms 14.660-663,666-668 
non-ccntral 14.654-673 
limiting distribution 14.663 -666 

connection with two Poisson scries 14.664- 
665 

tRhle of 5% points 14.665 
Multiplicative process 

functional equation 19.209 
Mutation 
advantageous 

diffusion 81.355-369 
dominance 31.356 
quantitative effect 31.356 
proliability of spreading 19.215-217 
Myers, R. J. 

method of moments 29.307-314 
Myosotts 

wheat weed 3.130 

Negative binomial 

derived from Poisson 38.182, 43.54 
fit by moments • 

comparison with maximum likelihood 
38.186-187 
efficiency 88.183- 185 
information matrix 88.184 
limiting form when zeros are not observed 
43.54-58 

estimation 43.55-56 
testing fit 48.57 
variance of estimate 48.56 
Neyman, J. 

confidence limits 26.393,398 
tmting hypotheses 84.294-296, S5.173a-173b 
Neyman-Pearson theory 

relation to estimation 84.296 
uniformly most powerful tests 24.294-296 
Non-additive response 

fiotatoes to N and A", 18.211-212 
Non-central t 

asymptotic moments 23.xxxi-xxxii 
distribution 83.xxx-xxxiii 
Non-normality, measures of 

accuracy of approximations 80.234- 235 
definition and low moments (n*"*) 80.231- 
233,235-237 

low moments (exact) 21.16-28 
low moments relatcnl to moments of Ac,- 
81.16-28 

Normal distribution (see many other entries) 
and de Moivre 13.362-365 
and Gauss 18.805 
and Laplace 12.805 
derivatives and integrals 28.xxvi~xxix 



INDEX 


Normal dwtribution {Continued) 
efficiency of median 11.70&-707 
Kelley table 6.443 
sample of three 21.19-20 
truncated 

accuracy 28.xxxiv 
fitting 2S.xxxiii-xxxiv 
Number of species 
distribution 46.52-67 
Numerical integration 

differential equation 61.361-367 

Observers 

comparison between 4.328 
by analysis of variance 4.328 
errors of counting 4.362-364 
Optimum value 

interclass correlation 1 
of a parameter 10.310 
Order of observation 

use in seeking trouble 4.338-340 
Orthogonal cube 
of 3 40.286-288 

and balanced incomplete blocks 40.286-288 
of 4 40.280-290 

and balanced incomplete blocks 40.289-290 
Orthogonal polynomials 8.111-120 
form 8.119 

• reduction in variance 6.119-120 
residuals 8.122-124 
correlation 6.124-126 
Owen, W. 

sugar refinery bacteria 4.347,360,366-360 

Page, A. J. 

computer 14.666 
Pairman, E. 

grouping corrections 10.(310)368, 29.306-308 
polygamma functions 10.368 
Pearson, Egon S. 

testing hypotheses 24.294-296, 86.173a-173b 
Pearson, Karij 
array means 6.598 

chisquare 6.86a-04, 7.2,9, 10.314, 12.800, 
14.654 

critical attacks 20.302a-318 
grouping corrections 10.319,368, 29.306-308 
inverse probability 10.311,368, 27.248.258 
method of moments 10.321,308, 29.302-318 
probable errors 10.316,329 
regression lines 6.697,603-607,612 
sampling momenta 20.200,238 
tables S.135, 4.368, 6.91 
Pearson curves (aee alao next seven entries) 
contours for 

form finding by moments 10.351-365 
location by moments 10.346 
scaling by moments 10.347 
diagram 10.343-345 
fitting by moments 10.342-351 
general 10.342-346 


Pearson curves {Coniintbtd) 
heterotyposis 10.346 
monumental 10.314 
regions of validity 

location by moments 10.343 
scaling by moments 10.340 
Pearson Type I (incomplete beta) 
occurrence 6.606 
Pearson Type II 

fitting by moments 10.364-355 
Pearson Type III (chisquare) 

approximation to Type VI 6.601-602 
center of location 10.330 
efficiency of mean 10.337 
estimation of exponent 

distribution of estimate 24.289-293 
sufficiency of estimate 24.289 
fitting by moments 10.332-^7 
location and scaling 10.339 
variance of estimate 10.332-337 
variance of scaling 10.340 
Pearson Type IV 
fitting by moments 10.340-341 
intrinsic accuracy 10.341 
location and scaling 10.341-342 
variance of location 10.341 
variance of scaling 10.341-342 
Pearson Type V 

fitting by moments 10.341-342 
Pearson Type VI (beta-prime) 
approximation to Type III O.GOl-602 
Pearson Typo VII (Student's t) 
fitting by moments 10.341-342,351-353 
occurrence 8.608-611 

Penultimate distributions (see Limiting distri¬ 
butions) 

Peters, C. A. F. 

accuracy formula 2.767, 27.265 
Planck, Max 

distribution problems, 10.315 
Plating method for bacterial density 
accuracy 4.325-359 
as Poisson series 4.325-359 

abnormally high variability 4.333-347 
abnormally low variability 4.347-350,356- 
357 

Poisson, S. D. 

distribution 4.330,358, 12.805, 14.654 
inverse probability 10.311,326 
Poisson distribution and series 
and maximum likelihood 10.359 
sufficiency 10.367 

and negative binomial 86.182-184, 46.54 
chisquare index 4.334-358, 12.807 
chisquare test of homogeneity 4.334-348 
classic examples 4.330 
excessive variance 4.337-347,350-354 
history 12.805, 14.654 
modified 28.xxxiv-xxxv 
possesses sufficient statistic 11.713 
relation to B distribution 14.664 



INDEX 


Poisson distribution and series {CotUinued) 
relation to cliisquare distribution 13.357-358 
relation to multinomial 5.86a,89a 
Soper’s tables 4.331,358 
subnormal variance 4.347-350,354 
as danger signal 4.347-350 
Polygonum 

wheat weed 8.130 
Polynomials 
orthogonal 3.111 -126 
Pooling 

immorality 18.204 
losses 18.204 

vs. non-f)ooling 36.174 176 
Population, infinite hy{)Othetioal 

conceptual resultant of conditions studied 
11.700 

connection with random sampling 11.700-701 
justification 10.312 

protection by tests of goodneas of fit 10.314 
Population geu*‘tics 19.205 207, 81.3S4a-369 
Potato 

re.spon8e to potash ami nitrogen 18.201-213 
Precision of eatimatr*a 2.757,762 
Prick, Rev. Richard 
inverse probability 22..528 
Probable errors 10.315,329 
l*robability 

Dayes* definition 27.245,258 
Keynes' definition 10.327,368 
most elementary statistical concept 10.312 
Probability integral (normal) 

integrals and derivatives 28.xxvii-xxx 
Probability points 
in terms of cumulants 30.10-11,13 
Problem of the Nile 27.2.57 
Protozoa (see Bacterial density) 

Purpose of statistics 10.311 

Q2 9.96 99 

Riidioactive decay 4.330,358 

as “a method for testing the laws of prob¬ 
ability” 4.330 

Rainfall and wheat 14.656,659 
Randolph, L. F. 

/ris 32.185,188 
Random samples 

non-existent as such 11.701 
Random vs. systematic arrangements 28.186- 
193 

on Wiebe’s data 28.189-193 
in pairs 28.191-192 
in sandwiches 28.191-192 
significance tests 28.192-193 
Randomized blocks 17.509-510 
Rare mutations 

and equilibrium distribution of gene ratios 
19.205-220 

Rectangular distributions 

supcrefficient estimation 10.348-351 


Recurrence formula 

for measures of non-normality 21.17-25 
related to gene ratios 10.210-215 
Region of validity 
definition 10.310 
Regrt^ssion 

and analysis of variance 36.238-239 
and discriminant functions 82.184-185, 
88.377 378 
cmifficient 

how to specify errors 84.422, 36.238 
goodne.ss of fit of 6.596a-607, 12.812 
and analy.sis of variance 12.812 
Pearson’s method 6,603-607 
Slutsky's method 6.603-607 
multiple dist rilmtion 6.611, 84.422 
non-orthogonal case 6.611 
non-linear 

classical least squares 36.239-240 
curvatuwi of manifolds 86.230 
geomet ry of n^siduals 86.239 
like linear 36.238 

of observed variance on mean 4.333 
polynomial, residuals 

serial correlation 3.125-126 
variation of variance with position 8.123 
simple, distribution 6.607-610 
testing with t 6,607-610 
Rejection of observations 
crude 10.322 
Replication 
purpose 17.505 

secondary purpose 17.506 
Rings (combinatorial, enumeration) 
of branches 41.399 
of parts 41.397-398 
Roy, S. N. 

generalized distance 14.653 h 
Russell, Sir John 

experimental design 17.506-513 
Ritthehforo, Sir Krnrst 
alpha particles 4.330 

Sample distribution (see name of statistic or 
underlying distribution) 

Sample moments 

moments of 20.199-238 
Sanik>n, H. 

soil bacteria 4.350 
•Scaling 

from normal samples (optimum exponent) 
2.762-763 

location and scaling 

distribution of estimates easily derivable 
24.305 

no loss in information 24.305 
Pearson curves 10.332-342 
of Cauchy 

maximum likelihood 11.707-709 
of normal 11.703-704 
process 10.310 



INDEX 


SCHTTSTBR, SiR ARTHUR 
harmonic anatysia 16.54 
Score 

constant over sets of samples 24.280 
first appearance 11.723 
in negative binomial near limit 48.57 
Sheep ticks 88.185-186 
Shbpparo, W. F. 

grouping corrections 10.317,361-363, 29.307- 
314 

sampling moments 20. 198 a, 200,238 
standard errors 10.320,368 
Sheppard’s corrections 

and maximum likelihood 10.360-361 
derivation 10.318-319 
periodic terms 10.318-320 
Significance of results 17.503-505 
Slutsky, E. 

regression lines 6.597,603-607,612 
Smith, K. 

goorlness of fit 10.360,368 
Soil bacteria 4.350-352,359 
Soil heterogeneity 
and block sise 18.208 
Soil protosoa 4.350-352,359 
Sonchus 

wheat weed 8.130 
Soper, H. E. 

Poisson tables 4.331,358 
sampling moments 20.200,238 
Species 

abundance of 

th€;oretical distribution 48.54-58 
Specification 

problems of 10.310,313-314 
Standard errors 

of estimate 10.329,368 
Statistic 

definition 11.701 

early uses 4.332, 6.598, 6.444, 9.05,97 
efficient 

correlation of two such 8.445-446 
correlation with inefficient 8 446 
maximum likelihood 8.445 
Stellaria 

wheat weed 8.131 
Stevens, W. L. 

geometrical probability 84.14 -17 
Stockinq, W. a. 

B. coli 4.350,352,355,350 
Student ( — William Sealy Gosset) 
correlation coefficient 12.811, 14.656 
founder of exact tests 27.251,258 
founder of fiducial 26.391,398 
haemocytometer 4.330,358 
his contribution 27.251-252 
mean square 12.807 
regression 6.608H112 
sampling moments 20.108a,200 
systematic designs 17.608,513, 28.180-193 
1 10.315,368, 12.808-«00, 18.355,365, 14.655 


Student {Continued) 
table 18.360-365 
unique sample 88.175,180 
Student's t 

distribution 18.356 
generalized 88.378 

history 10.315, 12.808-809, 14.655, 28.391, 
27.251,255, 85.175 

in Behrens-Fisher problem 26.393-395 
non-central 

Asymptotic moments 23.xxxi-xxxii 
distribution 28.xxx-xxxiii 
probability integral 18.358-360 
finite cases 18.359-360 
relation to binomial 18.360 
relation to normal distribution 12.808-809 
relation to z distribution 12.808-809 
ascs sufficient statistics 27.255 
Sufficiency 

criterion of 8.598, 10.310,316-317 
idea 2.768-769 

in terms of factorization 10.331 
of variance 10.315 

Sufficient statistics 11.712-714, 27.250 
and factorization 11.713-714 
definition 24.288, 26.53 
distribution, general form 24.288,293-204 
found by maximum likelihood 11.714 
Poisson series 11.713 
relation to UMP estimates 24.204-296 
restriction in nature 11.714 
Sugar refinery bacteria 4.347,350,355-^59 
SUKHATME, P. V. 

Behrens-Fisher tables 86.175-180 
combinatorics 41.395a,401 
Symbolical figures 20.232-233 

and two-way partitions 20.232-233 
Symmetric functions {see Cumulants, k-etaiis- 
tics. Moments) 

Systematic designs 
potatoes 18.206-207 
Student 17.508,513, 26.393, 28.189-193 
squares 28.189,103 
wheat 28.189-193 

( (see Student’s 0 
Tables 

B 14.665 - 88.380 
extreme values 16.188 
gene diffusion 81.366 
Herraite polynomials 80.12 
higher approximations 80.12 
information about k in negative binomial 
near k — 0 48.57 

log u — log V, where ue ■■ s'* — 1 48.56 
non-central chisquare 14.665 
of efficiency of median in location 
Cauchy 11.716 
double exponential 11.716 
of efficiency of moments in fitting 
Type II 10.355 



INDEX 


Tables, of efficiency of moments in fitting 
iConiimusd) 

Type VII 10.353 
of 6% levels 

test of significance in harmonic analysis 
16.59 

relating to multiple correlation coefficient 
14.665 

significance in harmonic analysis 16.59-59a 
95% level of r for four pairs of obaervaiiuns 
23.533 

TtTHOUPROFF, A. A. 

sampling moments 20.198a,200,215,238 
Teaching of statistics 

method of moments 29.315-317 
m»eded subjects 29.315-317 
Tedin, O. 

systematic squares 28.189,193 
Terminology, confusir>n of 
population and sample 10.311 
I'esting hypotheses 24.294-296, 86.173a-173b 
I'ftrahedron 

flecomptwsition <»f generalized 16..56 -59 
Theory f>f f^tirnatiun 10.308a-368, 11.699a 
725, 26.40 51 

foundali«)n by (lunss 27.249 
ThIele, T. N. 

method of moments 29.318 
seminvariants 20.198a 
Thornton, H. fl. 
agar media 4.325-359 
plating method 4.325-350 
Time series 

crop yields 8.106-135 
orthogonal polynomials 8.119-126 
TiPPErr, L. H. C. 

extreme values 16.180-190 
Tocher, J. P’. 

differential death-rates 6.93-94 
eye color 84 426 
Transformation 

angular 10.325, 19.205-220 
effect on generating function 20.227-228, 
21.27 28 

effect on mean, mode, and median 22.529 530 
of correlation coefficient (are Correlation, 
transformation) 

Tkavers, R. M. W. 

discriminant functions 83.376-386 
Trigamma function 24.291-293 
Turner, A. J. 

method of moments 29.304-313 

Ultimate distributions {see Limiting distribu¬ 
tions) 

UMP estimates 

relation to sufficient statistics 24.294-296 
Undetermined coefficients 

ratio distribution for harmonic analysis 
16.57-68 

Uniform distribution on a simplex 87.15 


Unique sample 
need for its analysis 86.175 
Student and the 36.175-180 

Variance {see also Estimation, Location, 
Method of Moments, Pearson curves, 
Scaling, etc.) 
component 

fiducial distribution 26.397-398 
conditional 48.56 
divided into parts 8.110 
early occurrence 2.762, 8.110, 4.332 
ostimaticin from several arrays 6.599-600 
generalized 

mctUMUrc of efficiency 38.183-185 
t ran.sformation 88.183-185 
genetic components 19.218-220 
joint distribution with mean deviation 2.763- 
770 

of estimate 26.42 

optimum among ix>wcrs 2.762 -763 
ratio 

approximation by chisquare and correc¬ 
tions 6.601 002 
distribution 6.600-601 
mluction by polynuinial fitting 3.119-120 
“relative ” 2.702 
sufficient 10.315 
Vkma, G. 

logarithmic tables 10.350,368 
V'bnn, j. a. 

inversi* probability 10.311,326,22.531,27.248, 
254,258 

Wai.ker, Gii.bkrt 

harmonic nnaIy.HiH 16.55 
Waulace, N. 

discriminant fiinctions 88.370,386 
Weather and crojw 8.106a-135, 14.656-659 
Weed.s 

in wheat 8.129-133 
Wheat yield and rainfall 14.656-659 
causes of change 3.126-135 
diminution 8.115-118 
slow changes 8.107-115 
Wiebe’s data 28.189-193 
Whi'ctakbr, L. 

deaths from horsc-kick 4.330,358 
WlEUE, G. A. 

wheat yields 28.189-193 
Wilks, S. S. 

Jlehrens-PTsher problem 86.173h 
(as Wilkes) generalized variance 86.247 
Williams, C. B. 

number of species 4S.53a-57 
Winter oats 

in randomized blocks 17.511 513 
Wish ART, John 

his distribution 36.244,249 
sampling moments 20.200,209,238 
Wishart distribution, form and proofs, 86.244 



INDEX 


Wbight, Sbwall 
population genetics 19.205-207 
Wyant, Z. N. 
soil bacteria 4.326,350 

Yatbs, Frank 
confounding 89.341,353 , 

statistical tables 48.57 
Yule, G. Udny 
association 10.368 
chisquare 4.338, 6.86a,90,94, 7.6-8 

s distribution 18.805-813 


2 distribution (Continued) 
approximate percentage points 80.12-14a 
distribution 18.355-365 
and analysis of ^riance 12.810-812 
limiting cases 13.808-813 
finite cases 18.361-362 
limiting cases 18.362-365 
relation to binomials 18.362-364 
history 14.655-656 
in Behrens-Fisher derivation 86.177 
probability integral 18.360-365 
2 transformation (see Correlation, transforma¬ 
tion) 




